Scrapy Feed exports - Scrapy

What is the use of Feed exports in Scrapy?

Description

Feed exports is a method which is used for storing the data scraped from the sites, that is generating a "export file".

Serialization Formats

By using multiple serialization formats and storage backends, Feed Exports use Item exporters and they generate a feed with scraped items.

Below table shows the supported formats

Sr.No

Format & Description

1

JSON

FEED_FORMAT is json

Exporter used is class scrapy.exporters.JsonItemExporter

2

JSON lines

FEED_FROMAT is jsonlines

Exporter used is class scrapy.exporters.JsonLinesItemExporter

3

CSV

FEED_FORMAT is CSV

Exporter used is class scrapy.exporters.CsvItemExporter

4

XML

FEED_FORMAT is xml

Exporter used is class scrapy.exporters.XmlItemExporter

Using FEED_EXPORTERS settings, the supported formats can also be extended

Sr.No

Format & Description

1

Pickle

FEED_FORMAT is pickel

Exporter used is class scrapy.exporters.PickleItemExporter

2

Marshal

FEED_FORMAT is marshal

Exporter used is class scrapy.exporters.MarshalItemExporter

Storage Backends

Storage backend will define where to store the feed using URI.

Below table displays the supported storage backends

Sr.No

Storage Backend & Description

1

Local filesystem

URI scheme is a file used for storing the feeds.

2

FTP

URI scheme is a ftp used for storing the feeds.

3

S3

URI scheme is S3 and the feeds will be stored on Amazon S3. External libraries botocore or boto are required.

4

Standard output

URI scheme is stdout and the feeds will be stored to the standard output.

Storage URI Parameters

Below are the parameters of storage URL which get replaced while the feed is being created

  • %(time)s: This parameter will be replaced by a timestamp.
  • %(name)s: This parameter will be replaced by spider name.

Settings

Below table displays the settings using which Feed exports can be configured

Sr.No

Setting & Description

1

FEED_URI

It is the URI of the export feed used for enabling the feed exports.

2

FEED_FORMAT

It is a serialization format used for the feed.

3

FEED_EXPORT_FIELDS

It is used to define the fields which should be exported.

4

FEED_STORE_EMPTY

It will define whether to export feeds with no items.

5

FEED_STORAGES

It is a dictionary with additional feed storage backends.

6

FEED_STORAGES_BASE

It is a dictionary with built-in feed storage backends.

7

FEED_EXPORTERS

It is a dictionary with additional feed exporters.

8

FEED_EXPORTERS_BASE

It is a dictionary with built-in feed exporters.

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Scrapy Topics