315 lines
15 KiB
Markdown
315 lines
15 KiB
Markdown
[//]: # (title: Deliver customized newsletters from RSS feeds with Platypush)
|
||
[//]: # (description: Use the RSS and email integrations to created automated newsletters.)
|
||
[//]: # (image: /img/newsletter-1.png)
|
||
[//]: # (author: Fabio Manganiello <fabio@platypush.tech>)
|
||
[//]: # (published: 2020-09-06)
|
||
|
||
I’ve always been a supporter of well-curated newsletters. They give me an opportunity to get a good overview of what
|
||
happened in the fields I follow within a span of a day, a week or a month. However, not all the newsletters fit this
|
||
category. Some don’t think three times before selling email addresses to 3rd-parties — and within the blink of an eye
|
||
your mailbox can easily get flooded with messages that you didn’t request. Others may sign up your address for other
|
||
services or newsletters as well, and often they don’t often much granularity to configure which communications you want
|
||
to receive. Even in the best-case scenario, the most privacy-savvy user may still think twice before signing up for a
|
||
newsletter — you’re giving your personal email address to someone else you don’t necessarily trust, implying “yes, this
|
||
is my address and I’m interested in this subject”. Additionally, most of the newsletters spice up their URLs with
|
||
tracking parameters, so they can easily measure user engagement — something you may not necessarily be happy with.
|
||
Moreover, the customization junkie may also have a valid use case for a more finely tuned selection of content in his
|
||
newsletter — you may want to group some sources together into the same daily/weekly email, or you may be interested only
|
||
in some particular subset of the subjects covered by a newsletter, filtering out those that aren’t relevant, or
|
||
customize the style of the digest that gets delivered. Finally, a fully automated way to deliver newsletters through 5
|
||
lines of code and the tuning of a couple of parameters is the nirvana for many companies of every size out there.
|
||
|
||
## Feed up the newsletter
|
||
|
||
Those who read my articles in the past may know that I’m an avid consumer of RSS feeds. Despite being a 21-year-old
|
||
technology, they do their job very well when it comes to deliver the information that matters without all the noise and
|
||
trackers, and they provide a very high level of integration being simple XML documents. However, in spite of all the
|
||
effort I put to be up-to-date with all my sources, a lot of potentially interesting content inevitably slips through —
|
||
and that’s where newsletters step in, as they filter and group together all the content that was generated in a given
|
||
time frame and periodically deliver it to your inbox.
|
||
|
||
My ideal solution would be something that combines the best aspects of both the worlds: the flexibility of an RSS
|
||
subscription, combined with a flexible way of filtering and aggregating content and sources, and get the full package
|
||
delivered at my door in whichever format I like (HTML, PDF, MOBI…). In this article I’m going to show how to achieve
|
||
this goal with a few tools:
|
||
|
||
- One or more sources that you want to track and that support RSS feeds (in this example I’ll use the [MIT Technology
|
||
Review RSS feed](https://www.technologyreview.com/feed/), but the procedure works for any RSS feed).
|
||
|
||
- An email address.
|
||
|
||
- [Platypush](https://git.platypush.tech/platypush/platypush) to do the heavy-lifting job — monitor the RSS sources at
|
||
custom intervals, trigger events when a source has some new content, create a digest out of the new content, and
|
||
deliver the full package to a list of email addresses.
|
||
|
||
Let’s cover these points step by step.
|
||
|
||
## Installing and configuring Platypush
|
||
|
||
We’ll be using the [`http.poll`](https://docs.platypush.tech/en/latest/platypush/backend/http.poll.html) backend
|
||
configured with one or more `RssUpdate` objects to poll our RSS sources at regular intervals and create the digests, and
|
||
either the [`mail.smtp`](https://docs.platypush.tech/en/latest/platypush/plugins/mail.smtp.html) plugin or the
|
||
[`google.mail`](https://docs.platypush.tech/en/latest/platypush/plugins/google.mail.html) plugin to send the
|
||
digests to our email.
|
||
|
||
You can install Platypush on any device where you want to run your logic — a RaspberryPi, an old laptop, a cloud node,
|
||
and so on. We will install the base package with the `rss` module. Optionally, you can install it with the `pdf` module
|
||
as well (if you want to export your digests also to PDF) or the `google` module (if you want to send the newsletter from
|
||
a GMail address instead of an SMTP server).
|
||
|
||
The first option is to install the latest stable version through `pip`:
|
||
|
||
```shell
|
||
[sudo] pip install 'platypush[rss,pdf,google]'
|
||
```
|
||
|
||
The other option is to install the latest git version:
|
||
|
||
```shell
|
||
git clone https://git.platypush.tech/platypush/platypush
|
||
cd platypush
|
||
[sudo] pip install '.[rss,pdf,google]'
|
||
```
|
||
|
||
## Monitoring your RSS feeds
|
||
|
||
Once the software is installed, create the configuration file `~/.config/platypush/config.yaml` if it doesn't exist
|
||
already and add the configuration for the RSS monitor:
|
||
|
||
```yaml
|
||
# Generic HTTP endpoint monitor
|
||
backend.http.poll:
|
||
requests:
|
||
# Add a new RSS feed to the pool
|
||
- type: platypush.backend.http.request.rss.RssUpdates
|
||
# URL to the RSS feed
|
||
url: https://www.technologyreview.com/feed/
|
||
# Title of the feed (shown in the head of the digest)
|
||
title: MIT Technology Review
|
||
# How often we should monitor this source (24*60*60 secs = once a day)
|
||
poll_seconds: 86400
|
||
# Format of the digest (HTML or PDF)
|
||
digest_format: html
|
||
```
|
||
|
||
You can also add more sources to the `http.poll` `requests` object, each with its own configuration. Also, you can
|
||
customize the style of your digest by passing some valid CSS to these configuration attributes:
|
||
|
||
```yaml
|
||
# Style of the body element
|
||
body_style: 'font-size: 20px; font-family: "Merriweather", Georgia, "Times New Roman", Times, serif'
|
||
|
||
# Style of the main title
|
||
title_style: 'margin-top: 30px'
|
||
|
||
# Style of the subtitle
|
||
subtitle_style: 'margin-top: 10px; page-break-after: always'
|
||
|
||
# Style of the article titles
|
||
article_title_style: 'font-size: 1.6em; margin-top: 1em; padding-top: 1em; border-top: 1px solid #999'
|
||
|
||
# Style of the article link
|
||
article_link_style: 'color: #555; text-decoration: none; border-bottom: 1px dotted font-size: 0.8em'
|
||
|
||
# Style of the article content
|
||
article_content_style: 'font-size: 0.8em'
|
||
```
|
||
|
||
The `digest_format` attribute determines the output format of your digest - you may want to choose `html` if you want to
|
||
deliver a summary of the articles in a newsletter, or pdf if you want instead to deliver the full content of each item
|
||
as an attachment to an email address. Bonus point: since you can send PDFs to a Kindle if you
|
||
[configured an email address](https://www.amazon.com/gp/sendtokindle/email),
|
||
this mechanism allows you to deliver the full digest of your RSS feeds to your Kindle's email address.
|
||
|
||
The [`RssUpdates`](https://github.com/BlackLight/platypush/blob/ac02becba80fafce39d5bbcfc682f7a8fe46f529/platypush/backend/http/request/rss/__init__.py#L21)
|
||
object also provides native integration with the [Mercury Parser API](https://github.com/postlight/mercury-parser-api)
|
||
to automatically scrape the content of a web page - I covered some of these concepts in
|
||
my [previous article](https://blog.platypush.tech/article/Deliver-articles-to-your-favourite-e-reader-using-Platypush)
|
||
on how to parse RSS feeds and send the PDF digest to your e-reader. The same mechanism works well for newsletters too.
|
||
If you want to parse the content of the newsletter as well, all you have to do is configure
|
||
the [`http.webpage`](https://docs.platypush.tech/en/latest/platypush/plugins/http.webpage.html) Platypush
|
||
plugin. Since the Mercury API doesn't provide a Python binding, this requires a couple of JavaScript dependencies:
|
||
|
||
```shell
|
||
# Install Node and NPM, e.g. on Debian:
|
||
apt-get install nodejs npm
|
||
|
||
# Install the Mercury Parser API
|
||
npm install [-g] @postlight/mercury-parser
|
||
|
||
# Make sure that the Platypush PDF module dependencies
|
||
# are installed if you plan HTML->PDF conversion
|
||
pip install 'platypush[pdf]'
|
||
```
|
||
|
||
Then, if you want to parse the full content of the items and generate a PDF digest out of them, change your `http.poll`
|
||
configuration to something like this:
|
||
|
||
```yaml
|
||
backend.http.poll:
|
||
requests:
|
||
- type: platypush.backend.http.request.rss.RssUpdates
|
||
url: https://www.technologyreview.com/feed/
|
||
title: MIT Technology Review
|
||
poll_seconds: 86400
|
||
# PDF digest format
|
||
digest_format: pdf
|
||
# Extract the full content of the items
|
||
extract_content: True
|
||
```
|
||
|
||
**WARNING**: Extracting the full content of the articles in an RSS feed has two limitations — a practical one and a
|
||
legal one:
|
||
|
||
- Some websites may require user login before displaying the full content of an article. Some websites perform such
|
||
checks client-side — and the parser API can usually circumvent them, especially if the full content of an article is
|
||
actually just hidden behind a client-side paywall. Some websites, however, implement their user checks server-side too
|
||
before sending the content to the client — and in those cases the parser API may return only a part of the content or
|
||
no content at all.
|
||
|
||
- Always keep in mind that parsing the full content of an article behind a paywall may represent a violation of
|
||
intellectual property under some jurisdictions, so make sure to do it only for content that is either free or that you
|
||
have to permission to scrape.
|
||
|
||
## Configuring the mail delivery
|
||
|
||
When new content is published on a subscribed RSS feed Platypush will generate
|
||
a [NewFeedEvent](https://docs.platypush.tech/en/latest/platypush/events/http.rss.html) and it should create a copy
|
||
of the digest under `~/.local/share/platypush/feeds/cache/{date:time}_{feed-title}.[html|pdf]`. The `NewFeedEvent` in
|
||
particular is the link you need to create your custom logic that sends an email to a list of addresses when new content
|
||
is available.
|
||
|
||
First, configure the Platypush mail plugin you prefer. When it comes to sending emails you primarily have two options:
|
||
|
||
- The [`mail.smtp`](https://docs.platypush.tech/en/latest/platypush/plugins/mail.smtp.html) plugin — if you want to
|
||
send emails directly through an SMTP server. Platypush configuration:
|
||
|
||
```yaml
|
||
mail.smtp:
|
||
username: you@gmail.com
|
||
password: your-pass
|
||
server: smtp.gmail.com
|
||
port: 465
|
||
ssl: True
|
||
```
|
||
|
||
- The [`google.mail`](https://docs.platypush.tech/en/latest/platypush/plugins/google.mail.html) plugin — if you
|
||
want to use the native GMail API to send emails. If that is the case then first make sure that you have the
|
||
dependencies for the Platypush Google module installed:
|
||
|
||
```shell
|
||
[sudo] pip install 'platypush[google]'
|
||
```
|
||
|
||
In this case you’ll also have to create a project on
|
||
the [Google Developers console](https://console.developers.google.com/) and download the OAuth credentials:
|
||
|
||
- Click on “Credentials” from the context menu > OAuth Client ID.
|
||
|
||
- Once generated, you can see your new credentials in the “OAuth 2.0 client IDs” section. Click on the “Download” icon to save them to a JSON file.
|
||
|
||
- Copy the file to your Platypush device/server under e.g. `~/.credentials/google/client_secret.json`.
|
||
|
||
- Run the following command on the device to authorize the application:
|
||
|
||
```shell
|
||
python -m platypush.plugins.google.credentials \
|
||
"https://www.googleapis.com/auth/gmail.modify" \
|
||
~/.credentials/google/client_secret.json \
|
||
--noauth_local_webserver
|
||
```
|
||
|
||
At this point the GMail delivery is ready to be used by your Platypush automation.
|
||
|
||
## Connecting the dots
|
||
|
||
Now that both the RSS parsing logic and the mail integration are in place, we can glue them together through the
|
||
[`NewFeedEvent`](https://docs.platypush.tech/en/latest/platypush/events/http.rss.html) event. The new advised way
|
||
to configure events in Platypush is through native Python scripts - the custom YAML-based syntax for events and
|
||
procedure was becoming too cumbersome to maintain and write (although it’s still supported), and I feel like going back
|
||
to a clean and simple Python API may be a better option.
|
||
|
||
Create and initialize the Platypush scripts directory, if it doesn’t exist already:
|
||
|
||
```shell
|
||
mkdir -p ~/.config/platypush/scripts
|
||
cd ~/.config/platypush/scripts
|
||
|
||
# Make sure that the scripts module is initialized
|
||
touch __init__.py
|
||
```
|
||
|
||
Then, create a new hook on `NewFeedEvent`:
|
||
|
||
```shell
|
||
$EDITOR rss_news.py
|
||
```
|
||
|
||
```python
|
||
import os
|
||
from typing import List
|
||
|
||
from platypush.event.hook import hook
|
||
from platypush.message.event.http.rss import NewFeedEvent
|
||
from platypush.utils import run
|
||
|
||
# Path to your mailing list - a text file with one address per line
|
||
maillist = os.path.expanduser('~/.mail.list')
|
||
|
||
def get_addresses() -> List[str]:
|
||
with open(maillist, 'r') as f:
|
||
return [addr.strip() for addr in f.readlines()
|
||
if addr.strip() and not addr.strip().startswith('#')]
|
||
|
||
|
||
# This hook matches:
|
||
# - event_type=NewFeedEvent
|
||
# - digest_format='html'
|
||
# - source_title='MIT Technology Review'
|
||
@hook(NewFeedEvent, digest_format='html', source_title='MIT Technology Review')
|
||
def send_mit_rss_feed_digest(event: NewFeedEvent, **_):
|
||
# The digest output file is stored in event.args['digest_filename']
|
||
with open(event.args['digest_filename'], 'r') as f:
|
||
run(action='mail.smtp.send',
|
||
from_='you@yourdomain.com',
|
||
to=get_addresses(),
|
||
subject=f'{event.args.get("source_title")} feed digest',
|
||
body=f.read(),
|
||
body_type='html')
|
||
|
||
# Or, if you opted for the native GMail plugin you may want to go for:
|
||
|
||
@hook(NewFeedEvent, digest_format='html', source_title='MIT Technology Review')
|
||
def send_mit_rss_feed_digest(event: NewFeedEvent, **_):
|
||
# The digest output file is stored in event.args['digest_filename']
|
||
with open(event.args['digest_filename'], 'r') as f:
|
||
run(action='google.mail.compose',
|
||
sender='you@gmail.com',
|
||
to=get_addresses(),
|
||
subject=f'{event.args.get("source_title")} feed digest',
|
||
body=f.read())
|
||
|
||
# If instead you want to send the digest in PDF format as an attachment:
|
||
|
||
@hook(NewFeedEvent, digest_format='html', source_title='MIT Technology Review')
|
||
def send_mit_rss_feed_digest(event: NewFeedEvent, **_):
|
||
# mail.smtp plugin case
|
||
run(action='mail.smtp.send',
|
||
from_='you@yourdomain.com',
|
||
to=get_addresses(),
|
||
subject=f'{event.args.get("source_title")} feed digest',
|
||
body='',
|
||
attachments=[event.args['digest_filename']])
|
||
|
||
# google.mail case
|
||
run(action='google.mail.compose',
|
||
sender='you@gmail.com',
|
||
to=get_addresses(),
|
||
subject=f'{event.args.get("source_title")} feed digest',
|
||
body='',
|
||
files=[event.args['digest_filename']])
|
||
```
|
||
|
||
Finally, create your `~/.mail.list` file with one destination email address per line and start platypush either from the
|
||
command line or as a service. You should receive your email with the first batch of articles shortly after startup, and
|
||
you'll receive more items if a new batch is available after the `poll_seconds` configured period.
|