│ │ │
│ │ │ │ │ │
│ │ │

Sending e-mail

│ │ │ -

Although Python makes sending e-mails relatively easy via the smtplib │ │ │ +

Although Python makes sending e-mails relatively easy via the smtplib │ │ │ library, Scrapy provides its own facility for sending e-mails which is very │ │ │ -easy to use and it’s implemented using Twisted non-blocking IO, to avoid interfering with the non-blocking │ │ │ +easy to use and it’s implemented using Twisted non-blocking IO, to avoid interfering with the non-blocking │ │ │ IO of the crawler. It also provides a simple API for sending attachments and │ │ │ it’s very easy to configure, with a few settings.

│ │ │
│ │ │

Quick example

│ │ │

There are two ways to instantiate the mail sender. You can instantiate it using │ │ │ the standard __init__ method:

│ │ │
from scrapy.mail import MailSender
│ │ │ @@ -253,33 +253,33 @@
│ │ │  
mailer.send(to=["someone@example.com"], subject="Some subject", body="Some body", cc=["another@example.com"])
│ │ │  
│ │ │
│ │ │
│ │ │
│ │ │

MailSender class reference

│ │ │

MailSender is the preferred class to use for sending emails from Scrapy, as it │ │ │ -uses Twisted non-blocking IO, like the │ │ │ +uses Twisted non-blocking IO, like the │ │ │ rest of the framework.

│ │ │
│ │ │
│ │ │ class scrapy.mail.MailSender(smtphost=None, mailfrom=None, smtpuser=None, smtppass=None, smtpport=None)[source]
│ │ │
│ │ │
Parameters
│ │ │
    │ │ │ -
  • smtphost (str or bytes) – the SMTP host to use for sending the emails. If omitted, the │ │ │ +

  • smtphost (str or bytes) – the SMTP host to use for sending the emails. If omitted, the │ │ │ MAIL_HOST setting will be used.

  • │ │ │ -
  • mailfrom (str) – the address used to send emails (in the From: header). │ │ │ +

  • mailfrom (str) – the address used to send emails (in the From: header). │ │ │ If omitted, the MAIL_FROM setting will be used.

  • │ │ │
  • smtpuser – the SMTP user. If omitted, the MAIL_USER │ │ │ setting will be used. If not given, no SMTP authentication will be │ │ │ performed.

  • │ │ │ -
  • smtppass (str or bytes) – the SMTP pass for authentication.

  • │ │ │ -
  • smtpport (int) – the SMTP port to connect to

  • │ │ │ -
  • smtptls (bool) – enforce using SMTP STARTTLS

  • │ │ │ -
  • smtpssl (bool) – enforce using a secure SSL connection

  • │ │ │ +
  • smtppass (str or bytes) – the SMTP pass for authentication.

  • │ │ │ +
  • smtpport (int) – the SMTP port to connect to

  • │ │ │ +
  • smtptls (bool) – enforce using SMTP STARTTLS

  • │ │ │ +
  • smtpssl (bool) – enforce using a secure SSL connection

  • │ │ │
│ │ │
│ │ │
│ │ │
│ │ │
│ │ │ classmethod from_settings(settings)[source]
│ │ │

Instantiate using a Scrapy settings object, which will respect │ │ │ @@ -294,25 +294,25 @@ │ │ │

│ │ │
│ │ │ send(to, subject, body, cc=None, attachs=(), mimetype='text/plain', charset=None)[source]
│ │ │

Send email to the given recipients.

│ │ │
│ │ │
Parameters
│ │ │
    │ │ │ -
  • to (str or list) – the e-mail recipients as a string or as a list of strings

  • │ │ │ -
  • subject (str) – the subject of the e-mail

  • │ │ │ -
  • cc (str or list) – the e-mails to CC as a string or as a list of strings

  • │ │ │ -
  • body (str) – the e-mail body

  • │ │ │ -
  • attachs (collections.abc.Iterable) – an iterable of tuples (attach_name, mimetype, │ │ │ +

  • to (str or list) – the e-mail recipients as a string or as a list of strings

  • │ │ │ +
  • subject (str) – the subject of the e-mail

  • │ │ │ +
  • cc (str or list) – the e-mails to CC as a string or as a list of strings

  • │ │ │ +
  • body (str) – the e-mail body

  • │ │ │ +
  • attachs (collections.abc.Iterable) – an iterable of tuples (attach_name, mimetype, │ │ │ file_object) where attach_name is a string with the name that will │ │ │ appear on the e-mail’s attachment, mimetype is the mimetype of the │ │ │ attachment and file_object is a readable file object with the │ │ │ contents of the attachment

  • │ │ │ -
  • mimetype (str) – the MIME type of the e-mail

  • │ │ │ -
  • charset (str) – the character encoding to use for the e-mail contents

  • │ │ │ +
  • mimetype (str) – the MIME type of the e-mail

  • │ │ │ +
  • charset (str) – the character encoding to use for the e-mail contents

  • │ │ │
│ │ │
│ │ │
│ │ │
│ │ │ │ │ │
│ │ ├── ./usr/share/doc/python-scrapy-doc/html/topics/exceptions.html │ │ │ @@ -237,15 +237,15 @@ │ │ │
│ │ │
│ │ │ exception scrapy.exceptions.CloseSpider(reason='cancelled')[source]
│ │ │

This exception can be raised from a spider callback to request the spider to be │ │ │ closed/stopped. Supported arguments:

│ │ │
│ │ │
Parameters
│ │ │ -

reason (str) – the reason for closing

│ │ │ +

reason (str) – the reason for closing

│ │ │
│ │ │
│ │ │
│ │ │ │ │ │

For example:

│ │ │
def parse_page(self, response):
│ │ │      if 'Bandwidth exceeded' in response.body:
│ │ │ @@ -339,15 +339,15 @@
│ │ │  received in the signal handler that raises the exception. Also, the response
│ │ │  object is marked with "download_stopped" in its Response.flags
│ │ │  attribute.

│ │ │
│ │ │

Note

│ │ │

fail is a keyword-only parameter, i.e. raising │ │ │ StopDownload(False) or StopDownload(True) will raise │ │ │ -a TypeError.

│ │ │ +a TypeError.

│ │ │
│ │ │

See the documentation for the bytes_received signal │ │ │ and the Stopping the download of a Response topic for additional information and examples.

│ │ │
│ │ │
│ │ │
│ │ ├── ./usr/share/doc/python-scrapy-doc/html/topics/exporters.html │ │ │ @@ -374,17 +374,17 @@ │ │ │

By default, this method looks for a serializer declared in the item │ │ │ field and returns the result of applying │ │ │ that serializer to the value. If no serializer is found, it returns the │ │ │ value unchanged.

│ │ │
│ │ │
Parameters
│ │ │
    │ │ │ -
  • field (Field object or a dict instance) – the field being serialized. If the source item object does not define field metadata, field is an empty │ │ │ -dict.

  • │ │ │ -
  • name (str) – the name of the field being serialized

  • │ │ │ +
  • field (Field object or a dict instance) – the field being serialized. If the source item object does not define field metadata, field is an empty │ │ │ +dict.

  • │ │ │ +
  • name (str) – the name of the field being serialized

  • │ │ │
  • value – the value being serialized

  • │ │ │
│ │ │
│ │ │
│ │ │ │ │ │ │ │ │
│ │ │ @@ -453,31 +453,31 @@ │ │ │

PythonItemExporter

│ │ │
│ │ │
│ │ │ class scrapy.exporters.PythonItemExporter(*, dont_fail=False, **kwargs)[source]
│ │ │

This is a base class for item exporters that extends │ │ │ BaseItemExporter with support for nested items.

│ │ │

It serializes items to built-in Python types, so that any serialization │ │ │ -library (e.g. json or msgpack) can be used on top of it.

│ │ │ +library (e.g. json or msgpack) can be used on top of it.

│ │ │
│ │ │ │ │ │
│ │ │
│ │ │

XmlItemExporter

│ │ │
│ │ │
│ │ │ class scrapy.exporters.XmlItemExporter(file, item_element='item', root_element='items', **kwargs)[source]
│ │ │

Exports items in XML format to the specified file object.

│ │ │
│ │ │
Parameters
│ │ │
    │ │ │
  • file – the file-like object to use for exporting the data. Its write method should │ │ │ accept bytes (a disk file opened in binary mode, a io.BytesIO object, etc)

  • │ │ │ -
  • root_element (str) – The name of root element in the exported XML.

  • │ │ │ -
  • item_element (str) – The name of each item element in the exported XML.

  • │ │ │ +
  • root_element (str) – The name of root element in the exported XML.

  • │ │ │ +
  • item_element (str) – The name of each item element in the exported XML.

  • │ │ │
│ │ │
│ │ │
│ │ │

The additional keyword arguments of this __init__ method are passed to the │ │ │ BaseItemExporter __init__ method.

│ │ │

A typical output of this exporter would be:

│ │ │
<?xml version="1.0" encoding="utf-8"?>
│ │ │ @@ -526,28 +526,28 @@
│ │ │  CSV columns and their order. The export_empty_fields attribute has
│ │ │  no effect on this exporter.

│ │ │
│ │ │
Parameters
│ │ │
    │ │ │
  • file – the file-like object to use for exporting the data. Its write method should │ │ │ accept bytes (a disk file opened in binary mode, a io.BytesIO object, etc)

  • │ │ │ -
  • include_headers_line (str) – If enabled, makes the exporter output a header │ │ │ +

  • include_headers_line (str) – If enabled, makes the exporter output a header │ │ │ line with the field names taken from │ │ │ BaseItemExporter.fields_to_export or the first exported item fields.

  • │ │ │
  • join_multivalued – The char (or chars) that will be used for joining │ │ │ multi-valued fields, if found.

  • │ │ │ -
  • errors (str) – The optional string that specifies how encoding and decoding │ │ │ +

  • errors (str) – The optional string that specifies how encoding and decoding │ │ │ errors are to be handled. For more information see │ │ │ -io.TextIOWrapper.

  • │ │ │ +io.TextIOWrapper.

    │ │ │
│ │ │
│ │ │
│ │ │

The additional keyword arguments of this __init__ method are passed to the │ │ │ BaseItemExporter __init__ method, and the leftover arguments to the │ │ │ -csv.writer() function, so you can use any csv.writer() function │ │ │ +csv.writer() function, so you can use any csv.writer() function │ │ │ argument to customize this exporter.

│ │ │

A typical output of this exporter would be:

│ │ │
product,price
│ │ │  Color TV,1200
│ │ │  DVD player,200
│ │ │  
│ │ │
│ │ │ @@ -561,19 +561,19 @@ │ │ │ class scrapy.exporters.PickleItemExporter(file, protocol=0, **kwargs)[source] │ │ │

Exports items in pickle format to the given file-like object.

│ │ │
│ │ │
Parameters
│ │ │
    │ │ │
  • file – the file-like object to use for exporting the data. Its write method should │ │ │ accept bytes (a disk file opened in binary mode, a io.BytesIO object, etc)

  • │ │ │ -
  • protocol (int) – The pickle protocol to use.

  • │ │ │ +
  • protocol (int) – The pickle protocol to use.

  • │ │ │
│ │ │
│ │ │
│ │ │ -

For more information, see pickle.

│ │ │ +

For more information, see pickle.

│ │ │

The additional keyword arguments of this __init__ method are passed to the │ │ │ BaseItemExporter __init__ method.

│ │ │

Pickle isn’t a human readable format, so no output examples are provided.

│ │ │
│ │ │ │ │ │
│ │ │
│ │ │ @@ -603,16 +603,16 @@ │ │ │

JsonItemExporter

│ │ │
│ │ │
│ │ │ class scrapy.exporters.JsonItemExporter(file, **kwargs)[source]
│ │ │

Exports items in JSON format to the specified file-like object, writing all │ │ │ objects as a list of objects. The additional __init__ method arguments are │ │ │ passed to the BaseItemExporter __init__ method, and the leftover │ │ │ -arguments to the JSONEncoder __init__ method, so you can use any │ │ │ -JSONEncoder __init__ method argument to customize this exporter.

│ │ │ +arguments to the JSONEncoder __init__ method, so you can use any │ │ │ +JSONEncoder __init__ method argument to customize this exporter.

│ │ │
│ │ │
Parameters
│ │ │

file – the file-like object to use for exporting the data. Its write method should │ │ │ accept bytes (a disk file opened in binary mode, a io.BytesIO object, etc)

│ │ │
│ │ │
│ │ │

A typical output of this exporter would be:

│ │ │ @@ -637,16 +637,16 @@ │ │ │

JsonLinesItemExporter

│ │ │
│ │ │
│ │ │ class scrapy.exporters.JsonLinesItemExporter(file, **kwargs)[source]
│ │ │

Exports items in JSON format to the specified file-like object, writing one │ │ │ JSON-encoded item per line. The additional __init__ method arguments are passed │ │ │ to the BaseItemExporter __init__ method, and the leftover arguments to │ │ │ -the JSONEncoder __init__ method, so you can use any │ │ │ -JSONEncoder __init__ method argument to customize this exporter.

│ │ │ +the JSONEncoder __init__ method, so you can use any │ │ │ +JSONEncoder __init__ method argument to customize this exporter.

│ │ │
│ │ │
Parameters
│ │ │

file – the file-like object to use for exporting the data. Its write method should │ │ │ accept bytes (a disk file opened in binary mode, a io.BytesIO object, etc)

│ │ │
│ │ │
│ │ │

A typical output of this exporter would be:

│ │ │ @@ -661,20 +661,20 @@ │ │ │
│ │ │
│ │ │

MarshalItemExporter

│ │ │
│ │ │
│ │ │ class scrapy.exporters.MarshalItemExporter(file, **kwargs)[source]
│ │ │

Exports items in a Python-specific binary format (see │ │ │ -marshal).

│ │ │ +marshal).

│ │ │
│ │ │
Parameters
│ │ │

file – The file-like object to use for exporting the data. Its │ │ │ -write method should accept bytes (a disk file │ │ │ -opened in binary mode, a BytesIO object, etc)

│ │ │ +write method should accept bytes (a disk file │ │ │ +opened in binary mode, a BytesIO object, etc)

│ │ │
│ │ │
│ │ │
│ │ │ │ │ │
│ │ │
│ │ │
│ │ ├── ./usr/share/doc/python-scrapy-doc/html/topics/extensions.html │ │ │ @@ -554,15 +554,15 @@ │ │ │
│ │ │

Debugger extension

│ │ │
│ │ │
│ │ │ class scrapy.extensions.debug.Debugger[source]
│ │ │
│ │ │ │ │ │ -

Invokes a Python debugger inside a running Scrapy process when a SIGUSR2 │ │ │ +

Invokes a Python debugger inside a running Scrapy process when a SIGUSR2 │ │ │ signal is received. After the debugger is exited, the Scrapy process continues │ │ │ running normally.

│ │ │

For more info see Debugging in Python.

│ │ │

This extension only works on POSIX-compliant platforms (i.e. not Windows).

│ │ │
│ │ │
│ │ │