2021-01-31 22:08:40 +01:00
[//]: # (title: Deliver customized newsletters from RSS feeds with Platypush)
2021-01-31 22:04:19 +01:00
[//]: # (description: Use the RSS and email integrations to created automated newsletters.)
2021-01-31 22:05:01 +01:00
[//]: # (image: /img/newsletter-1.png)
2021-01-31 22:04:19 +01:00
[//]: # (author: Fabio Manganiello < fabio @ platypush . tech > )
[//]: # (published: 2020-09-06)
I’ ve always been a supporter of well-curated newsletters. They give me an opportunity to get a good overview of what
happened in the fields I follow within a span of a day, a week or a month. However, not all the newsletters fit this
category. Some don’ t think three times before selling email addresses to 3rd-parties — and within the blink of an eye
your mailbox can easily get flooded with messages that you didn’ t request. Others may sign up your address for other
services or newsletters as well, and often they don’ t often much granularity to configure which communications you want
to receive. Even in the best-case scenario, the most privacy-savvy user may still think twice before signing up for a
newsletter — you’ re giving your personal email address to someone else you don’ t necessarily trust, implying “yes, this
is my address and I’ m interested in this subject”. Additionally, most of the newsletters spice up their URLs with
tracking parameters, so they can easily measure user engagement — something you may not necessarily be happy with.
Moreover, the customization junkie may also have a valid use case for a more finely tuned selection of content in his
newsletter — you may want to group some sources together into the same daily/weekly email, or you may be interested only
in some particular subset of the subjects covered by a newsletter, filtering out those that aren’ t relevant, or
customize the style of the digest that gets delivered. Finally, a fully automated way to deliver newsletters through 5
lines of code and the tuning of a couple of parameters is the nirvana for many companies of every size out there.
## Feed up the newsletter
Those who read my articles in the past may know that I’ m an avid consumer of RSS feeds. Despite being a 21-year-old
technology, they do their job very well when it comes to deliver the information that matters without all the noise and
trackers, and they provide a very high level of integration being simple XML documents. However, in spite of all the
effort I put to be up-to-date with all my sources, a lot of potentially interesting content inevitably slips through —
and that’ s where newsletters step in, as they filter and group together all the content that was generated in a given
time frame and periodically deliver it to your inbox.
My ideal solution would be something that combines the best aspects of both the worlds: the flexibility of an RSS
subscription, combined with a flexible way of filtering and aggregating content and sources, and get the full package
delivered at my door in whichever format I like (HTML, PDF, MOBI…). In this article I’ m going to show how to achieve
this goal with a few tools:
- One or more sources that you want to track and that support RSS feeds (in this example I’ ll use the [MIT Technology
Review RSS feed](https://www.technologyreview.com/feed/), but the procedure works for any RSS feed).
- An email address.
- [Platypush ](https://git.platypush.tech/platypush/platypush ) to do the heavy-lifting job — monitor the RSS sources at
custom intervals, trigger events when a source has some new content, create a digest out of the new content, and
deliver the full package to a list of email addresses.
Let’ s cover these points step by step.
## Installing and configuring Platypush
2021-02-02 01:15:30 +01:00
We’ ll be using the [`http.poll` ](https://docs.platypush.tech/en/latest/platypush/backend/http.poll.html ) backend
2021-01-31 22:04:19 +01:00
configured with one or more `RssUpdate` objects to poll our RSS sources at regular intervals and create the digests, and
2021-02-02 01:15:30 +01:00
either the [`mail.smtp` ](https://docs.platypush.tech/en/latest/platypush/plugins/mail.smtp.html ) plugin or the
[`google.mail` ](https://docs.platypush.tech/en/latest/platypush/plugins/google.mail.html ) plugin to send the
2021-01-31 22:04:19 +01:00
digests to our email.
You can install Platypush on any device where you want to run your logic — a RaspberryPi, an old laptop, a cloud node,
and so on. We will install the base package with the `rss` module. Optionally, you can install it with the `pdf` module
as well (if you want to export your digests also to PDF) or the `google` module (if you want to send the newsletter from
a GMail address instead of an SMTP server).
The first option is to install the latest stable version through `pip` :
```shell
[sudo] pip install 'platypush[rss,pdf,google]'
```
The other option is to install the latest git version:
```shell
git clone https://git.platypush.tech/platypush/platypush
cd platypush
[sudo] pip install '.[rss,pdf,google]'
```
## Monitoring your RSS feeds
Once the software is installed, create the configuration file `~/.config/platypush/config.yaml` if it doesn't exist
already and add the configuration for the RSS monitor:
```yaml
# Generic HTTP endpoint monitor
backend.http.poll:
requests:
# Add a new RSS feed to the pool
- type: platypush.backend.http.request.rss.RssUpdates
# URL to the RSS feed
url: https://www.technologyreview.com/feed/
# Title of the feed (shown in the head of the digest)
title: MIT Technology Review
# How often we should monitor this source (24*60*60 secs = once a day)
poll_seconds: 86400
# Format of the digest (HTML or PDF)
digest_format: html
```
You can also add more sources to the `http.poll` `requests` object, each with its own configuration. Also, you can
customize the style of your digest by passing some valid CSS to these configuration attributes:
```yaml
# Style of the body element
body_style: 'font-size: 20px; font-family: "Merriweather", Georgia, "Times New Roman", Times, serif'
# Style of the main title
title_style: 'margin-top: 30px'
# Style of the subtitle
subtitle_style: 'margin-top: 10px; page-break-after: always'
# Style of the article titles
article_title_style: 'font-size: 1.6em; margin-top: 1em; padding-top: 1em; border-top: 1px solid #999 '
# Style of the article link
article_link_style: 'color: #555 ; text-decoration: none; border-bottom: 1px dotted font-size: 0.8em'
# Style of the article content
article_content_style: 'font-size: 0.8em'
```
The `digest_format` attribute determines the output format of your digest - you may want to choose `html` if you want to
deliver a summary of the articles in a newsletter, or pdf if you want instead to deliver the full content of each item
as an attachment to an email address. Bonus point: since you can send PDFs to a Kindle if you
[configured an email address ](https://www.amazon.com/gp/sendtokindle/email ),
this mechanism allows you to deliver the full digest of your RSS feeds to your Kindle's email address.
The [`RssUpdates` ](https://github.com/BlackLight/platypush/blob/ac02becba80fafce39d5bbcfc682f7a8fe46f529/platypush/backend/http/request/rss/__init__.py#L21 )
object also provides native integration with the [Mercury Parser API ](https://github.com/postlight/mercury-parser-api )
to automatically scrape the content of a web page - I covered some of these concepts in
my [previous article ](https://blog.platypush.tech/article/Deliver-articles-to-your-favourite-e-reader-using-Platypush )
on how to parse RSS feeds and send the PDF digest to your e-reader. The same mechanism works well for newsletters too.
If you want to parse the content of the newsletter as well, all you have to do is configure
2021-02-02 01:15:30 +01:00
the [`http.webpage` ](https://docs.platypush.tech/en/latest/platypush/plugins/http.webpage.html ) Platypush
2021-01-31 22:04:19 +01:00
plugin. Since the Mercury API doesn't provide a Python binding, this requires a couple of JavaScript dependencies:
```shell
# Install Node and NPM, e.g. on Debian:
apt-get install nodejs npm
# Install the Mercury Parser API
npm install [-g] @postlight/mercury -parser
# Make sure that the Platypush PDF module dependencies
# are installed if you plan HTML->PDF conversion
pip install 'platypush[pdf]'
```
Then, if you want to parse the full content of the items and generate a PDF digest out of them, change your `http.poll`
configuration to something like this:
```yaml
backend.http.poll:
requests:
- type: platypush.backend.http.request.rss.RssUpdates
url: https://www.technologyreview.com/feed/
title: MIT Technology Review
poll_seconds: 86400
# PDF digest format
digest_format: pdf
# Extract the full content of the items
extract_content: True
```
**WARNING**: Extracting the full content of the articles in an RSS feed has two limitations — a practical one and a
legal one:
- Some websites may require user login before displaying the full content of an article. Some websites perform such
checks client-side — and the parser API can usually circumvent them, especially if the full content of an article is
actually just hidden behind a client-side paywall. Some websites, however, implement their user checks server-side too
before sending the content to the client — and in those cases the parser API may return only a part of the content or
no content at all.
- Always keep in mind that parsing the full content of an article behind a paywall may represent a violation of
intellectual property under some jurisdictions, so make sure to do it only for content that is either free or that you
have to permission to scrape.
## Configuring the mail delivery
When new content is published on a subscribed RSS feed Platypush will generate
2021-02-02 01:15:30 +01:00
a [NewFeedEvent ](https://docs.platypush.tech/en/latest/platypush/events/http.rss.html ) and it should create a copy
2021-01-31 22:04:19 +01:00
of the digest under `~/.local/share/platypush/feeds/cache/{date:time}_{feed-title}.[html|pdf]` . The `NewFeedEvent` in
particular is the link you need to create your custom logic that sends an email to a list of addresses when new content
is available.
First, configure the Platypush mail plugin you prefer. When it comes to sending emails you primarily have two options:
2021-02-02 01:15:30 +01:00
- The [`mail.smtp` ](https://docs.platypush.tech/en/latest/platypush/plugins/mail.smtp.html ) plugin — if you want to
2021-01-31 22:04:19 +01:00
send emails directly through an SMTP server. Platypush configuration:
```yaml
mail.smtp:
username: you@gmail.com
password: your-pass
server: smtp.gmail.com
port: 465
ssl: True
```
2021-02-02 01:15:30 +01:00
- The [`google.mail` ](https://docs.platypush.tech/en/latest/platypush/plugins/google.mail.html ) plugin — if you
2021-01-31 22:04:19 +01:00
want to use the native GMail API to send emails. If that is the case then first make sure that you have the
dependencies for the Platypush Google module installed:
```shell
[sudo] pip install 'platypush[google]'
```
In this case you’ ll also have to create a project on
the [Google Developers console ](https://console.developers.google.com/ ) and download the OAuth credentials:
- Click on “Credentials” from the context menu > OAuth Client ID.
- Once generated, you can see your new credentials in the “OAuth 2.0 client IDs” section. Click on the “Download” icon to save them to a JSON file.
- Copy the file to your Platypush device/server under e.g. `~/.credentials/google/client_secret.json` .
- Run the following command on the device to authorize the application:
```shell
python -m platypush.plugins.google.credentials \
"https://www.googleapis.com/auth/gmail.modify" \
~/.credentials/google/client_secret.json \
--noauth_local_webserver
```
At this point the GMail delivery is ready to be used by your Platypush automation.
## Connecting the dots
Now that both the RSS parsing logic and the mail integration are in place, we can glue them together through the
2021-02-02 01:15:30 +01:00
[`NewFeedEvent` ](https://docs.platypush.tech/en/latest/platypush/events/http.rss.html ) event. The new advised way
2021-01-31 22:04:19 +01:00
to configure events in Platypush is through native Python scripts - the custom YAML-based syntax for events and
procedure was becoming too cumbersome to maintain and write (although it’ s still supported), and I feel like going back
to a clean and simple Python API may be a better option.
Create and initialize the Platypush scripts directory, if it doesn’ t exist already:
```shell
mkdir -p ~/.config/platypush/scripts
cd ~/.config/platypush/scripts
# Make sure that the scripts module is initialized
touch __init__ .py
```
Then, create a new hook on `NewFeedEvent` :
```shell
$EDITOR rss_news.py
```
```python
import os
from typing import List
from platypush.event.hook import hook
from platypush.message.event.http.rss import NewFeedEvent
from platypush.utils import run
# Path to your mailing list - a text file with one address per line
maillist = os.path.expanduser('~/.mail.list')
def get_addresses() -> List[str]:
with open(maillist, 'r') as f:
return [addr.strip() for addr in f.readlines()
if addr.strip() and not addr.strip().startswith('#')]
# This hook matches:
# - event_type=NewFeedEvent
# - digest_format='html'
# - source_title='MIT Technology Review'
@hook (NewFeedEvent, digest_format='html', source_title='MIT Technology Review')
def send_mit_rss_feed_digest(event: NewFeedEvent, **_):
# The digest output file is stored in event.args['digest_filename']
with open(event.args['digest_filename'], 'r') as f:
run(action='mail.smtp.send',
from_='you@yourdomain.com',
to=get_addresses(),
subject=f'{event.args.get("source_title")} feed digest',
body=f.read(),
body_type='html')
# Or, if you opted for the native GMail plugin you may want to go for:
@hook (NewFeedEvent, digest_format='html', source_title='MIT Technology Review')
def send_mit_rss_feed_digest(event: NewFeedEvent, **_):
# The digest output file is stored in event.args['digest_filename']
with open(event.args['digest_filename'], 'r') as f:
run(action='google.mail.compose',
sender='you@gmail.com',
to=get_addresses(),
subject=f'{event.args.get("source_title")} feed digest',
body=f.read())
# If instead you want to send the digest in PDF format as an attachment:
@hook (NewFeedEvent, digest_format='html', source_title='MIT Technology Review')
def send_mit_rss_feed_digest(event: NewFeedEvent, **_):
# mail.smtp plugin case
run(action='mail.smtp.send',
from_='you@yourdomain.com',
to=get_addresses(),
subject=f'{event.args.get("source_title")} feed digest',
body='',
attachments=[event.args['digest_filename']])
# google.mail case
run(action='google.mail.compose',
sender='you@gmail.com',
to=get_addresses(),
subject=f'{event.args.get("source_title")} feed digest',
body='',
files=[event.args['digest_filename']])
```
Finally, create your `~/.mail.list` file with one destination email address per line and start platypush either from the
command line or as a service. You should receive your email with the first batch of articles shortly after startup, and
you'll receive more items if a new batch is available after the `poll_seconds` configured period.