Scrapy Command Line Tools - Scrapy

What is the use of Command line tools in Scrapy?

Description

Scrapy command line tool known as 'Scrapy tool' is used to control Scrapy. It includes the commands for various objects with a group of arguments and options.

Configuration Settings

Scrapy will find configuration settings in the scrapy.cfg file. Below are a few locations

  • C:\scrapy(project folder)\scrapy.cfg in the system
  • ~/.config/scrapy.cfg ($XDG_CONFIG_HOME) and ~/.scrapy.cfg ($HOME) for global settings
  • You can find the scrapy.cfg inside the root of the project.

Scrapy can also be configured using below environment variables

  • SCRAPY_SETTINGS_MODULE
  • SCRAPY_PROJECT
  • SCRAPY_PYTHON_SHELL

Default Structure Scrapy Project

Below structure shows the default file structure of the Scrapy project.

Scrapy.cfg file is a project root directory which includes the project name with the project settings.

For example

Using Scrapy Tool

Scrapy tool provides some usage and available commands as follows

Creating a Project

You can use below command for crearting project in Scrapy

This creates a project called project_name directory. Next, click on newly created project using below command

Controlling Projects

Project can be controlled and managed using Scrapy tool and you can also create the new spider using below command

Commands such as crawl, etc. should be used inside the Scrapy project. We will explain which commands should run inside the Scrapy project in next chapters.

Scrapy consists of some built-in commands which are used for your project. To see the list of available commands use below command

When you run below command, Scrapy displays a list of available commands as listed

  • fetch − It will fetch the URL using Scrapy downloader.
  • runspider − It is used to run self-contained spider without creating a project.
  • settings − It will specify the project setting value.
  • shell − It is an interactive scraping module for the given URL.
  • startproject − It will create a new Scrapy project.
  • version − It will display the Scrapy version.
  • view − It will fetch the URL using Scrapy downloader and displays the contents in a browser.

Below are some project related commands

  • crawl − It is used to crawl data using spider.
  • check − It will check the items returned by the crawled command.
  • list − It will display a list of available spiders present in the project.
  • edit – spiders can be edited using the editor.
  • parse − It will parse the given URL with the spider.
  • bench − It is used to run quick benchmark test (Benchmark tells how many number of pages can be crawled per minute by Scrapy).

Custom Project Commands

A custom project command can be built with COMMANDS_MODULE setting in Scrapy project. It will include a default empty string in the setting. You can add below custom command

Scrapy commands can be added using the scrapy.commands section in the setup.py file shown as follows

Above code adds cmd_demo command in the setup.py file.

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Scrapy Topics