Scrapy Shell - Scrapy

What is the use of Scrapy shell?

Description

Scrapy shell is used to scrap the data with error free code without the use of spider. Main use of Scrapy shell is to test the extracted code, XPath, or CSS expressions. It also helps in specifying the web pages from which the data is scraped.

Configuring the Shell

You can configure shell by installing IPython (used for interactive computing) console. This is a powerful interactive shell which gives auto completion, colorized output, etc.

If you are working on the Unix platform, then it is good to install the IPython. You can also use bpython if IPython is not accessible.

Shell can be configured by setting the environment variable called SCRAPY_PYTHON_SHELL or by defining the scrapy.cfg file as follows

Launching the Shell

Scrapy shell can be launched using below command

url specifies the URL for which the data needs to be scraped.

Using the Shell

Shell provides some additional shortcuts and Scrapy objects as explained in below table

Available Shortcuts

Shell offers below available shortcuts in the project

Sr.No

Shortcut & Description

1

shelp()

It offers available objects and shortcuts with the help option.

2

fetch(request_or_url)

It will collect the response from the request or URL and associated objects will get updated properly.

3

view(response)

Response can be viewed for the given request in the local browser for observation and to display the external link correctly. It will append a base tag to the response body.

Available Scrapy Objects

Shell offers below available Scrapy objects in the project

Sr.No

Object & Description

1

crawler

It will specify the current crawler object.

2

spider

If spider is not available for present URL, then it handles the URL or spider object by defining the new spider.

3

request

It will specify the request object for the last collected page.

4

response

It will specify the response object for the last collected page.

5

settings

It offers the current Scrapy settings.

Example of Shell Session

Let’s try scraping scrapy.org site and then begin to scrap the data from reddit.com as explained.

Before going forward, first we will launch the shell as shown in below command

Scrapy displays the available objects while using the above URL

Next, begin with the working of objects as shown below

Invoking the Shell from Spiders to Inspect Responses

You can inspect the responses which are processed from the spider, only if you are expecting to get that response.

For example

As shown in the above code, shell can be invoked from spiders to inspect the responses using below function

Now run the spider, and you will get below screen

You can examine whether the extracted code is working using below code

It displays the output as

Above line will display only a blank output. Now you can invoke the shell to inspect the response as follows

It displays the response as

All rights reserved © 2020 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Scrapy Topics