blog/static/pages/Deliver-customized-newsletters-from-RSS-feeds-with-Platypush.md

315 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[//]: # (title: Deliver customized newsletters from RSS feeds with Platypush)
[//]: # (description: Use the RSS and email integrations to created automated newsletters.)
[//]: # (image: /img/newsletter-1.png)
[//]: # (author: Fabio Manganiello <fabio@platypush.tech>)
[//]: # (published: 2020-09-06)
Ive always been a supporter of well-curated newsletters. They give me an opportunity to get a good overview of what
happened in the fields I follow within a span of a day, a week or a month. However, not all the newsletters fit this
category. Some dont think three times before selling email addresses to 3rd-parties — and within the blink of an eye
your mailbox can easily get flooded with messages that you didnt request. Others may sign up your address for other
services or newsletters as well, and often they dont often much granularity to configure which communications you want
to receive. Even in the best-case scenario, the most privacy-savvy user may still think twice before signing up for a
newsletter — youre giving your personal email address to someone else you dont necessarily trust, implying “yes, this
is my address and Im interested in this subject”. Additionally, most of the newsletters spice up their URLs with
tracking parameters, so they can easily measure user engagement — something you may not necessarily be happy with.
Moreover, the customization junkie may also have a valid use case for a more finely tuned selection of content in his
newsletter — you may want to group some sources together into the same daily/weekly email, or you may be interested only
in some particular subset of the subjects covered by a newsletter, filtering out those that arent relevant, or
customize the style of the digest that gets delivered. Finally, a fully automated way to deliver newsletters through 5
lines of code and the tuning of a couple of parameters is the nirvana for many companies of every size out there.
## Feed up the newsletter
Those who read my articles in the past may know that Im an avid consumer of RSS feeds. Despite being a 21-year-old
technology, they do their job very well when it comes to deliver the information that matters without all the noise and
trackers, and they provide a very high level of integration being simple XML documents. However, in spite of all the
effort I put to be up-to-date with all my sources, a lot of potentially interesting content inevitably slips through —
and thats where newsletters step in, as they filter and group together all the content that was generated in a given
time frame and periodically deliver it to your inbox.
My ideal solution would be something that combines the best aspects of both the worlds: the flexibility of an RSS
subscription, combined with a flexible way of filtering and aggregating content and sources, and get the full package
delivered at my door in whichever format I like (HTML, PDF, MOBI…). In this article Im going to show how to achieve
this goal with a few tools:
- One or more sources that you want to track and that support RSS feeds (in this example Ill use the [MIT Technology
Review RSS feed](https://www.technologyreview.com/feed/), but the procedure works for any RSS feed).
- An email address.
- [Platypush](https://git.platypush.tech/platypush/platypush) to do the heavy-lifting job — monitor the RSS sources at
custom intervals, trigger events when a source has some new content, create a digest out of the new content, and
deliver the full package to a list of email addresses.
Lets cover these points step by step.
## Installing and configuring Platypush
Well be using the [`http.poll`](https://docs.platypush.tech/en/latest/platypush/backend/http.poll.html) backend
configured with one or more `RssUpdate` objects to poll our RSS sources at regular intervals and create the digests, and
either the [`mail.smtp`](https://docs.platypush.tech/en/latest/platypush/plugins/mail.smtp.html) plugin or the
[`google.mail`](https://docs.platypush.tech/en/latest/platypush/plugins/google.mail.html) plugin to send the
digests to our email.
You can install Platypush on any device where you want to run your logic — a RaspberryPi, an old laptop, a cloud node,
and so on. We will install the base package with the `rss` module. Optionally, you can install it with the `pdf` module
as well (if you want to export your digests also to PDF) or the `google` module (if you want to send the newsletter from
a GMail address instead of an SMTP server).
The first option is to install the latest stable version through `pip`:
```shell
[sudo] pip install 'platypush[rss,pdf,google]'
```
The other option is to install the latest git version:
```shell
git clone https://git.platypush.tech/platypush/platypush
cd platypush
[sudo] pip install '.[rss,pdf,google]'
```
## Monitoring your RSS feeds
Once the software is installed, create the configuration file `~/.config/platypush/config.yaml` if it doesn't exist
already and add the configuration for the RSS monitor:
```yaml
# Generic HTTP endpoint monitor
backend.http.poll:
requests:
# Add a new RSS feed to the pool
- type: platypush.backend.http.request.rss.RssUpdates
# URL to the RSS feed
url: https://www.technologyreview.com/feed/
# Title of the feed (shown in the head of the digest)
title: MIT Technology Review
# How often we should monitor this source (24*60*60 secs = once a day)
poll_seconds: 86400
# Format of the digest (HTML or PDF)
digest_format: html
```
You can also add more sources to the `http.poll` `requests` object, each with its own configuration. Also, you can
customize the style of your digest by passing some valid CSS to these configuration attributes:
```yaml
# Style of the body element
body_style: 'font-size: 20px; font-family: "Merriweather", Georgia, "Times New Roman", Times, serif'
# Style of the main title
title_style: 'margin-top: 30px'
# Style of the subtitle
subtitle_style: 'margin-top: 10px; page-break-after: always'
# Style of the article titles
article_title_style: 'font-size: 1.6em; margin-top: 1em; padding-top: 1em; border-top: 1px solid #999'
# Style of the article link
article_link_style: 'color: #555; text-decoration: none; border-bottom: 1px dotted font-size: 0.8em'
# Style of the article content
article_content_style: 'font-size: 0.8em'
```
The `digest_format` attribute determines the output format of your digest - you may want to choose `html` if you want to
deliver a summary of the articles in a newsletter, or pdf if you want instead to deliver the full content of each item
as an attachment to an email address. Bonus point: since you can send PDFs to a Kindle if you
[configured an email address](https://www.amazon.com/gp/sendtokindle/email),
this mechanism allows you to deliver the full digest of your RSS feeds to your Kindle's email address.
The [`RssUpdates`](https://github.com/BlackLight/platypush/blob/ac02becba80fafce39d5bbcfc682f7a8fe46f529/platypush/backend/http/request/rss/__init__.py#L21)
object also provides native integration with the [Mercury Parser API](https://github.com/postlight/mercury-parser-api)
to automatically scrape the content of a web page - I covered some of these concepts in
my [previous article](https://blog.platypush.tech/article/Deliver-articles-to-your-favourite-e-reader-using-Platypush)
on how to parse RSS feeds and send the PDF digest to your e-reader. The same mechanism works well for newsletters too.
If you want to parse the content of the newsletter as well, all you have to do is configure
the [`http.webpage`](https://docs.platypush.tech/en/latest/platypush/plugins/http.webpage.html) Platypush
plugin. Since the Mercury API doesn't provide a Python binding, this requires a couple of JavaScript dependencies:
```shell
# Install Node and NPM, e.g. on Debian:
apt-get install nodejs npm
# Install the Mercury Parser API
npm install [-g] @postlight/mercury-parser
# Make sure that the Platypush PDF module dependencies
# are installed if you plan HTML->PDF conversion
pip install 'platypush[pdf]'
```
Then, if you want to parse the full content of the items and generate a PDF digest out of them, change your `http.poll`
configuration to something like this:
```yaml
backend.http.poll:
requests:
- type: platypush.backend.http.request.rss.RssUpdates
url: https://www.technologyreview.com/feed/
title: MIT Technology Review
poll_seconds: 86400
# PDF digest format
digest_format: pdf
# Extract the full content of the items
extract_content: True
```
**WARNING**: Extracting the full content of the articles in an RSS feed has two limitations — a practical one and a
legal one:
- Some websites may require user login before displaying the full content of an article. Some websites perform such
checks client-side — and the parser API can usually circumvent them, especially if the full content of an article is
actually just hidden behind a client-side paywall. Some websites, however, implement their user checks server-side too
before sending the content to the client — and in those cases the parser API may return only a part of the content or
no content at all.
- Always keep in mind that parsing the full content of an article behind a paywall may represent a violation of
intellectual property under some jurisdictions, so make sure to do it only for content that is either free or that you
have to permission to scrape.
## Configuring the mail delivery
When new content is published on a subscribed RSS feed Platypush will generate
a [NewFeedEvent](https://docs.platypush.tech/en/latest/platypush/events/http.rss.html) and it should create a copy
of the digest under `~/.local/share/platypush/feeds/cache/{date:time}_{feed-title}.[html|pdf]`. The `NewFeedEvent` in
particular is the link you need to create your custom logic that sends an email to a list of addresses when new content
is available.
First, configure the Platypush mail plugin you prefer. When it comes to sending emails you primarily have two options:
- The [`mail.smtp`](https://docs.platypush.tech/en/latest/platypush/plugins/mail.smtp.html) plugin — if you want to
send emails directly through an SMTP server. Platypush configuration:
```yaml
mail.smtp:
username: you@gmail.com
password: your-pass
server: smtp.gmail.com
port: 465
ssl: True
```
- The [`google.mail`](https://docs.platypush.tech/en/latest/platypush/plugins/google.mail.html) plugin — if you
want to use the native GMail API to send emails. If that is the case then first make sure that you have the
dependencies for the Platypush Google module installed:
```shell
[sudo] pip install 'platypush[google]'
```
In this case youll also have to create a project on
the [Google Developers console](https://console.developers.google.com/) and download the OAuth credentials:
- Click on “Credentials” from the context menu > OAuth Client ID.
- Once generated, you can see your new credentials in the “OAuth 2.0 client IDs” section. Click on the “Download” icon to save them to a JSON file.
- Copy the file to your Platypush device/server under e.g. `~/.credentials/google/client_secret.json`.
- Run the following command on the device to authorize the application:
```shell
python -m platypush.plugins.google.credentials \
"https://www.googleapis.com/auth/gmail.modify" \
~/.credentials/google/client_secret.json \
--noauth_local_webserver
```
At this point the GMail delivery is ready to be used by your Platypush automation.
## Connecting the dots
Now that both the RSS parsing logic and the mail integration are in place, we can glue them together through the
[`NewFeedEvent`](https://docs.platypush.tech/en/latest/platypush/events/http.rss.html) event. The new advised way
to configure events in Platypush is through native Python scripts - the custom YAML-based syntax for events and
procedure was becoming too cumbersome to maintain and write (although its still supported), and I feel like going back
to a clean and simple Python API may be a better option.
Create and initialize the Platypush scripts directory, if it doesnt exist already:
```shell
mkdir -p ~/.config/platypush/scripts
cd ~/.config/platypush/scripts
# Make sure that the scripts module is initialized
touch __init__.py
```
Then, create a new hook on `NewFeedEvent`:
```shell
$EDITOR rss_news.py
```
```python
import os
from typing import List
from platypush.event.hook import hook
from platypush.message.event.http.rss import NewFeedEvent
from platypush.utils import run
# Path to your mailing list - a text file with one address per line
maillist = os.path.expanduser('~/.mail.list')
def get_addresses() -> List[str]:
with open(maillist, 'r') as f:
return [addr.strip() for addr in f.readlines()
if addr.strip() and not addr.strip().startswith('#')]
# This hook matches:
# - event_type=NewFeedEvent
# - digest_format='html'
# - source_title='MIT Technology Review'
@hook(NewFeedEvent, digest_format='html', source_title='MIT Technology Review')
def send_mit_rss_feed_digest(event: NewFeedEvent, **_):
# The digest output file is stored in event.args['digest_filename']
with open(event.args['digest_filename'], 'r') as f:
run(action='mail.smtp.send',
from_='you@yourdomain.com',
to=get_addresses(),
subject=f'{event.args.get("source_title")} feed digest',
body=f.read(),
body_type='html')
# Or, if you opted for the native GMail plugin you may want to go for:
@hook(NewFeedEvent, digest_format='html', source_title='MIT Technology Review')
def send_mit_rss_feed_digest(event: NewFeedEvent, **_):
# The digest output file is stored in event.args['digest_filename']
with open(event.args['digest_filename'], 'r') as f:
run(action='google.mail.compose',
sender='you@gmail.com',
to=get_addresses(),
subject=f'{event.args.get("source_title")} feed digest',
body=f.read())
# If instead you want to send the digest in PDF format as an attachment:
@hook(NewFeedEvent, digest_format='html', source_title='MIT Technology Review')
def send_mit_rss_feed_digest(event: NewFeedEvent, **_):
# mail.smtp plugin case
run(action='mail.smtp.send',
from_='you@yourdomain.com',
to=get_addresses(),
subject=f'{event.args.get("source_title")} feed digest',
body='',
attachments=[event.args['digest_filename']])
# google.mail case
run(action='google.mail.compose',
sender='you@gmail.com',
to=get_addresses(),
subject=f'{event.args.get("source_title")} feed digest',
body='',
files=[event.args['digest_filename']])
```
Finally, create your `~/.mail.list` file with one destination email address per line and start platypush either from the
command line or as a service. You should receive your email with the first batch of articles shortly after startup, and
you'll receive more items if a new batch is available after the `poll_seconds` configured period.