Using the Frontier with Scrapy¶
Using Frontera is quite easy, it includes a set of Scrapy middlewares and Scrapy scheduler that encapsulates Frontera usage and can be easily configured using Scrapy settings.
Activating the frontier¶
The Frontera uses 2 different middlewares: SchedulerSpiderMiddleware
and SchedulerDownloaderMiddleware
, and it’s
own scheduler FronteraScheduler
To activate the Frontera in your Scrapy project, just add them to the SPIDER_MIDDLEWARES, DOWNLOADER_MIDDLEWARES and SCHEDULER settings:
SPIDER_MIDDLEWARES.update({
'frontera.contrib.scrapy.middlewares.schedulers.SchedulerSpiderMiddleware': 1000,
})
DOWNLOADER_MIDDLEWARES.update({
'frontera.contrib.scrapy.middlewares.schedulers.SchedulerDownloaderMiddleware': 1000,
})
SCHEDULER = 'frontera.contrib.scrapy.schedulers.frontier.FronteraScheduler'
Create a Frontera settings.py
file and add it to your Scrapy settings:
FRONTERA_SETTINGS = 'tutorial/frontera/settings.py'
Organizing files¶
When using frontier with a Scrapy project, we propose the following directory structure:
my_scrapy_project/
my_scrapy_project/
frontera/
__init__.py
settings.py
middlewares.py
backends.py
spiders/
...
__init__.py
settings.py
scrapy.cfg
These are basically:
my_scrapy_project/frontera/settings.py
: the Frontera settings file.my_scrapy_project/frontera/middlewares.py
: the middlewares used by the Frontera.my_scrapy_project/frontera/backends.py
: the backend(s) used by the Frontera.my_scrapy_project/spiders
: the Scrapy spiders foldermy_scrapy_project/settings.py
: the Scrapy settings filescrapy.cfg
: the Scrapy config file
Running the Crawl¶
Just run your Scrapy spider as usual from the command line:
scrapy crawl myspider