Settings¶
The Frontera settings allows you to customize the behaviour of all components, including the
FrontierManager
,
Middleware
and
Backend
themselves.
The infrastructure of the settings provides a global namespace of key-value mappings that can be used to pull configuration values from. The settings can be populated through different mechanisms, which are described below.
For a list of available built-in settings see: Built-in settings reference.
Designating the settings¶
When you use Frontera, you have to tell it which settings you’re using. As
FrontierManager
is the main entry point to Frontier usage,
you can do this by using the method described in the Loading from settings
section.
When using a string path pointing to a settings file for the frontier we propose the following directory structure:
my_project/
frontier/
__init__.py
settings.py
middlewares.py
backends.py
...
These are basically:
frontier/settings.py
: the frontier settings file.frontier/middlewares.py
: the middlewares used by the frontier.frontier/backends.py
: the backend(s) used by the frontier.
How to access settings¶
Settings
can be accessed through the
FrontierManager.settings
attribute, that is passed to
Middleware.from_manager
and
Backend.from_manager
class methods:
class MyMiddleware(Component):
@classmethod
def from_manager(cls, manager):
manager = crawler.settings
if settings.TEST_MODE:
print "test mode is enabled!"
In other words, settings can be accessed as attributes of the
Settings
object.
Settings class¶
Built-in frontier settings¶
Here’s a list of all available Frontera settings, in alphabetical order, along with their default values and the scope where they apply.
AUTO_START¶
Default: True
Whether to enable frontier automatic start. See Starting/Stopping the frontier
BACKEND¶
Default: 'frontera.contrib.backends.memory.FIFO'
The Backend
to be used by the frontier. For more info see
Activating a backend.
EVENT_LOGGER¶
Default: 'frontera.logger.events.EventLogManager'
The EventLoggerManager class to be used by the Frontier.
MAX_NEXT_REQUESTS¶
Default: 0
The maximum number of requests returned by
get_next_requests
API method.
If value is 0 (default), no maximum value will be used.
MAX_REQUESTS¶
Default: 0
Maximum number of returned requests after which Frontera is finished. If value is 0 (default), the frontier will continue indefinitely. See Finishing the frontier.
MIDDLEWARES¶
A list containing the middlewares enabled in the frontier. For more info see Activating a middleware.
Default:
[
'frontera.contrib.middlewares.fingerprint.UrlFingerprintMiddleware',
]
REQUEST_MODEL¶
Default: 'frontera.core.models.Request'
The Request
model to be used by the frontier.
RESPONSE_MODEL¶
Default: 'frontera.core.models.Response'
The Response
model to be used by the frontier.
OVERUSED_SLOT_FACTOR¶
Default: 5.0
(in progress + queued requests in that slot) / max allowed concurrent downloads per slot before slot is considered overused. This affects only Scrapy scheduler.”
DELAY_ON_EMPTY¶
Default: 30.0
When backend has no requests to fetch, this delay helps to exhaust the rest of the buffer without hitting backend on every request. Increase it if calls to your backend is taking a lot of time, and decrease if you need a fast spider bootstrap from seeds.
Built-in fingerprint middleware settings¶
Settings used by the UrlFingerprintMiddleware and DomainFingerprintMiddleware.
URL_FINGERPRINT_FUNCTION¶
Default: frontera.utils.fingerprint.sha1
The function used to calculate the url
fingerprint.
DOMAIN_FINGERPRINT_FUNCTION¶
Default: frontera.utils.fingerprint.sha1
The function used to calculate the domain
fingerprint.
Default settings¶
If no settings are specified, frontier will use the built-in default ones. For a complete list of default values see: Built-in settings reference. All default settings can be overridden.
Frontier default settings¶
Values:
PAGE_MODEL = 'frontera.core.models.Page'
LINK_MODEL = 'frontera.core.models.Link'
FRONTIER = 'frontera.core.frontier.Frontier'
MIDDLEWARES = [
'frontera.contrib.middlewares.fingerprint.UrlFingerprintMiddleware',
]
BACKEND = 'frontera.contrib.backends.memory.FIFO'
TEST_MODE = False
MAX_PAGES = 0
MAX_NEXT_PAGES = 0
AUTO_START = True
Fingerprints middleware default settings¶
Values:
URL_FINGERPRINT_FUNCTION = 'frontera.utils.fingerprint.sha1'
DOMAIN_FINGERPRINT_FUNCTION = 'frontera.utils.fingerprint.sha1'
Logging default settings¶
Values:
LOGGER = 'frontera.logger.FrontierLogger'
LOGGING_ENABLED = True
LOGGING_EVENTS_ENABLED = False
LOGGING_EVENTS_INCLUDE_METADATA = True
LOGGING_EVENTS_INCLUDE_DOMAIN = True
LOGGING_EVENTS_INCLUDE_DOMAIN_FIELDS = ['name', 'netloc', 'scheme', 'sld', 'tld', 'subdomain']
LOGGING_EVENTS_HANDLERS = [
"frontera.logger.handlers.COLOR_EVENTS",
]
LOGGING_MANAGER_ENABLED = False
LOGGING_MANAGER_LOGLEVEL = logging.DEBUG
LOGGING_MANAGER_HANDLERS = [
"frontera.logger.handlers.COLOR_CONSOLE_MANAGER",
]
LOGGING_BACKEND_ENABLED = False
LOGGING_BACKEND_LOGLEVEL = logging.DEBUG
LOGGING_BACKEND_HANDLERS = [
"frontera.logger.handlers.COLOR_CONSOLE_BACKEND",
]
LOGGING_DEBUGGING_ENABLED = False
LOGGING_DEBUGGING_LOGLEVEL = logging.DEBUG
LOGGING_DEBUGGING_HANDLERS = [
"frontera.logger.handlers.COLOR_CONSOLE_DEBUGGING",
]
EVENT_LOG_MANAGER = 'frontera.logger.events.EventLogManager'