Frontera
Frontera at a glance
1. Create your crawler
2. Integrate your crawler with the frontier
3. Choose your backend
4. Run the spider
What else?
What’s next?
Installation Guide
What is a Frontera?
Architecture overview
Overview
Components
Data Flow
Frontier objects
Request objects
Response objects
Identifying unique objects
Adding additional data to objects
Frontera API
Frontera API / Manager
Loading from settings
Frontier Manager
Starting/Stopping the frontier
Frontier iterations
Finishing the frontier
Component objects
Test mode
Another ways of using the frontier
Settings
Designating the settings
How to access settings
Settings class
Built-in frontier settings
Built-in fingerprint middleware settings
Default settings
Middlewares
Activating a middleware
Writing your own middleware
Built-in middleware reference
Backends
Activating a backend
Writing your own backend
Built-in backend reference
Using the Frontier with Scrapy
Activating the frontier
Organizing files
Running the Crawl
Frontier Scrapy settings
Using the Frontier with Requests
Graph Manager
Defining a Site Graph
Using the Graph Manager
CrawlPage objects
Adding pages and Links
Adding multiple sites
Graphs Database
Using graphs with status codes
A simple crawl faking example
Rendering graphs
How to use it
Testing a Frontier
Creating a Frontier Tester
Running a Test
Test Parameters
An example of use
Recording a Scrapy crawl
Activating the recorder
Choosing your storage engine
Running the Crawl
Recorder settings
Scrapy Seed Loaders
Activating a Seed loader
FileSeedLoader
S3SeedLoader
Examples
requests
scrapy_frontier
scrapy_recording
scripts
Best practices
Efficient parallel downloading
Tests
Running tests
Writing tests
Backend testing
Testing backend sequences
Testing basic algorithms
Release Notes
0.2.0 (released 2015-01-12)
0.1
Frontera
Docs
»
Edit on GitHub
Index
A
|
B
|
C
|
D
|
E
|
F
|
G
|
H
|
I
|
L
|
M
|
N
|
O
|
P
|
R
|
S
|
T
|
U
A
add_seeds() (frontera.core.components.Backend method)
(frontera.core.components.Component method)
(frontera.core.components.Middleware method)
(frontera.core.manager.FrontierManager method)
AUTO_START
setting
auto_start (frontera.core.manager.FrontierManager attribute)
B
BACKEND
setting
Backend (class in frontera.core.components)
backend (frontera.core.manager.FrontierManager attribute)
body (frontera.core.models.Response attribute)
C
Component (class in frontera.core.components)
cookies (frontera.core.models.Request attribute)
CrawlPage (built-in class)
D
DELAY_ON_EMPTY
setting
DOMAIN_FINGERPRINT_FUNCTION
setting
DomainFingerprintMiddleware (class in frontera.contrib.middlewares.fingerprint)
DomainMiddleware (class in frontera.contrib.middlewares.domain)
E
event_log_manager (frontera.core.manager.FrontierManager attribute)
EVENT_LOGGER
setting
EventLogger (built-in class)
F
finished (frontera.core.manager.FrontierManager attribute)
from_manager() (frontera.core.components.Backend method)
(frontera.core.components.Component class method)
(frontera.core.components.Middleware method)
from_settings() (frontera.core.manager.FrontierManager class method)
frontera.contrib.backends.memory.BASE (built-in class)
frontera.contrib.backends.memory.BFS (built-in class)
frontera.contrib.backends.memory.DFS (built-in class)
frontera.contrib.backends.memory.FIFO (built-in class)
frontera.contrib.backends.memory.LIFO (built-in class)
frontera.contrib.backends.memory.RANDOM (built-in class)
frontera.contrib.backends.sqlalchemy.BASE (built-in class)
frontera.contrib.backends.sqlalchemy.BFS (built-in class)
frontera.contrib.backends.sqlalchemy.DFS (built-in class)
frontera.contrib.backends.sqlalchemy.FIFO (built-in class)
frontera.contrib.backends.sqlalchemy.LIFO (built-in class)
frontera.contrib.backends.sqlalchemy.RANDOM (built-in class)
FRONTERA_SETTINGS
setting
frontier_start() (frontera.core.components.Backend method)
(frontera.core.components.Component method)
(frontera.core.components.Middleware method)
frontier_stop() (frontera.core.components.Backend method)
(frontera.core.components.Component method)
(frontera.core.components.Middleware method)
FrontierManager (class in frontera.core.manager)
G
get_next_requests() (frontera.core.components.Backend method)
(frontera.core.manager.FrontierManager method)
H
headers (frontera.core.models.Request attribute)
(frontera.core.models.Response attribute)
I
id (CrawlPage attribute)
is_seed (CrawlPage attribute)
iteration (frontera.core.manager.FrontierManager attribute)
L
links (CrawlPage attribute)
LOGGER
setting
Logger (built-in class)
logger (frontera.core.manager.FrontierManager attribute)
M
MAX_NEXT_REQUESTS
setting
max_next_requests (frontera.core.manager.FrontierManager attribute)
MAX_REQUESTS
setting
max_requests (frontera.core.manager.FrontierManager attribute)
meta (frontera.core.models.Request attribute)
(frontera.core.models.Response attribute)
method (frontera.core.models.Request attribute)
Middleware (class in frontera.core.components)
MIDDLEWARES
setting
middlewares (frontera.core.manager.FrontierManager attribute)
N
n_requests (frontera.core.manager.FrontierManager attribute)
name (frontera.core.components.Component attribute)
O
OVERUSED_SLOT_FACTOR
setting
P
page_crawled() (frontera.core.components.Backend method)
(frontera.core.components.Component method)
(frontera.core.components.Middleware method)
(frontera.core.manager.FrontierManager method)
R
RECORDER_ENABLED
setting
RECORDER_STORAGE_CLEAR_CONTENT
setting
RECORDER_STORAGE_DROP_ALL_TABLES
setting
RECORDER_STORAGE_ENGINE
setting
referers (CrawlPage attribute)
Request (class in frontera.core.models)
request (frontera.core.models.Response attribute)
request_error() (frontera.core.components.Backend method)
(frontera.core.components.Component method)
(frontera.core.components.Middleware method)
(frontera.core.manager.FrontierManager method)
REQUEST_MODEL
setting
request_model (frontera.core.manager.FrontierManager attribute)
Response (class in frontera.core.models)
RESPONSE_MODEL
setting
response_model (frontera.core.manager.FrontierManager attribute)
S
setting
AUTO_START
BACKEND
DELAY_ON_EMPTY
DOMAIN_FINGERPRINT_FUNCTION
EVENT_LOGGER
FRONTERA_SETTINGS
LOGGER
MAX_NEXT_REQUESTS
MAX_REQUESTS
MIDDLEWARES
OVERUSED_SLOT_FACTOR
RECORDER_ENABLED
RECORDER_STORAGE_CLEAR_CONTENT
RECORDER_STORAGE_DROP_ALL_TABLES
RECORDER_STORAGE_ENGINE
REQUEST_MODEL
RESPONSE_MODEL
TEST_MODE
URL_FINGERPRINT_FUNCTION
Settings (class in frontera.settings)
settings (frontera.core.manager.FrontierManager attribute)
start() (frontera.core.manager.FrontierManager method)
status (CrawlPage attribute)
status_code (frontera.core.models.Response attribute)
stop() (frontera.core.manager.FrontierManager method)
T
TEST_MODE
setting
test_mode (frontera.core.manager.FrontierManager attribute)
U
url (CrawlPage attribute)
(frontera.core.models.Request attribute)
(frontera.core.models.Response attribute)
URL_FINGERPRINT_FUNCTION
setting
UrlFingerprintMiddleware (class in frontera.contrib.middlewares.fingerprint)