Migrated newsletter automation article
This commit is contained in:
parent
8fa3be2fbc
commit
91d5a1176b
2 changed files with 315 additions and 0 deletions
BIN
static/img/newsletter-1.png
Normal file
BIN
static/img/newsletter-1.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 8.6 KiB |
|
@ -0,0 +1,315 @@
|
|||
[//]: # (title: Deliverd customized newsletters from RSS feeds with Platypush)
|
||||
[//]: # (description: Use the RSS and email integrations to created automated newsletters.)
|
||||
[//]: # (image: /img/extension-1.png)
|
||||
[//]: # (author: Fabio Manganiello <fabio@platypush.tech>)
|
||||
[//]: # (published: 2020-09-06)
|
||||
|
||||
I’ve always been a supporter of well-curated newsletters. They give me an opportunity to get a good overview of what
|
||||
happened in the fields I follow within a span of a day, a week or a month. However, not all the newsletters fit this
|
||||
category. Some don’t think three times before selling email addresses to 3rd-parties — and within the blink of an eye
|
||||
your mailbox can easily get flooded with messages that you didn’t request. Others may sign up your address for other
|
||||
services or newsletters as well, and often they don’t often much granularity to configure which communications you want
|
||||
to receive. Even in the best-case scenario, the most privacy-savvy user may still think twice before signing up for a
|
||||
newsletter — you’re giving your personal email address to someone else you don’t necessarily trust, implying “yes, this
|
||||
is my address and I’m interested in this subject”. Additionally, most of the newsletters spice up their URLs with
|
||||
tracking parameters, so they can easily measure user engagement — something you may not necessarily be happy with.
|
||||
Moreover, the customization junkie may also have a valid use case for a more finely tuned selection of content in his
|
||||
newsletter — you may want to group some sources together into the same daily/weekly email, or you may be interested only
|
||||
in some particular subset of the subjects covered by a newsletter, filtering out those that aren’t relevant, or
|
||||
customize the style of the digest that gets delivered. Finally, a fully automated way to deliver newsletters through 5
|
||||
lines of code and the tuning of a couple of parameters is the nirvana for many companies of every size out there.
|
||||
|
||||
## Feed up the newsletter
|
||||
|
||||
Those who read my articles in the past may know that I’m an avid consumer of RSS feeds. Despite being a 21-year-old
|
||||
technology, they do their job very well when it comes to deliver the information that matters without all the noise and
|
||||
trackers, and they provide a very high level of integration being simple XML documents. However, in spite of all the
|
||||
effort I put to be up-to-date with all my sources, a lot of potentially interesting content inevitably slips through —
|
||||
and that’s where newsletters step in, as they filter and group together all the content that was generated in a given
|
||||
time frame and periodically deliver it to your inbox.
|
||||
|
||||
My ideal solution would be something that combines the best aspects of both the worlds: the flexibility of an RSS
|
||||
subscription, combined with a flexible way of filtering and aggregating content and sources, and get the full package
|
||||
delivered at my door in whichever format I like (HTML, PDF, MOBI…). In this article I’m going to show how to achieve
|
||||
this goal with a few tools:
|
||||
|
||||
- One or more sources that you want to track and that support RSS feeds (in this example I’ll use the [MIT Technology
|
||||
Review RSS feed](https://www.technologyreview.com/feed/), but the procedure works for any RSS feed).
|
||||
|
||||
- An email address.
|
||||
|
||||
- [Platypush](https://git.platypush.tech/platypush/platypush) to do the heavy-lifting job — monitor the RSS sources at
|
||||
custom intervals, trigger events when a source has some new content, create a digest out of the new content, and
|
||||
deliver the full package to a list of email addresses.
|
||||
|
||||
Let’s cover these points step by step.
|
||||
|
||||
## Installing and configuring Platypush
|
||||
|
||||
We’ll be using the [`http.poll`](https://platypush.readthedocs.io/en/latest/platypush/backend/http.poll.html) backend
|
||||
configured with one or more `RssUpdate` objects to poll our RSS sources at regular intervals and create the digests, and
|
||||
either the [`mail.smtp`](https://platypush.readthedocs.io/en/latest/platypush/plugins/mail.smtp.html) plugin or the
|
||||
[`google.mail`](https://platypush.readthedocs.io/en/latest/platypush/plugins/google.mail.html) plugin to send the
|
||||
digests to our email.
|
||||
|
||||
You can install Platypush on any device where you want to run your logic — a RaspberryPi, an old laptop, a cloud node,
|
||||
and so on. We will install the base package with the `rss` module. Optionally, you can install it with the `pdf` module
|
||||
as well (if you want to export your digests also to PDF) or the `google` module (if you want to send the newsletter from
|
||||
a GMail address instead of an SMTP server).
|
||||
|
||||
The first option is to install the latest stable version through `pip`:
|
||||
|
||||
```shell
|
||||
[sudo] pip install 'platypush[rss,pdf,google]'
|
||||
```
|
||||
|
||||
The other option is to install the latest git version:
|
||||
|
||||
```shell
|
||||
git clone https://git.platypush.tech/platypush/platypush
|
||||
cd platypush
|
||||
[sudo] pip install '.[rss,pdf,google]'
|
||||
```
|
||||
|
||||
## Monitoring your RSS feeds
|
||||
|
||||
Once the software is installed, create the configuration file `~/.config/platypush/config.yaml` if it doesn't exist
|
||||
already and add the configuration for the RSS monitor:
|
||||
|
||||
```yaml
|
||||
# Generic HTTP endpoint monitor
|
||||
backend.http.poll:
|
||||
requests:
|
||||
# Add a new RSS feed to the pool
|
||||
- type: platypush.backend.http.request.rss.RssUpdates
|
||||
# URL to the RSS feed
|
||||
url: https://www.technologyreview.com/feed/
|
||||
# Title of the feed (shown in the head of the digest)
|
||||
title: MIT Technology Review
|
||||
# How often we should monitor this source (24*60*60 secs = once a day)
|
||||
poll_seconds: 86400
|
||||
# Format of the digest (HTML or PDF)
|
||||
digest_format: html
|
||||
```
|
||||
|
||||
You can also add more sources to the `http.poll` `requests` object, each with its own configuration. Also, you can
|
||||
customize the style of your digest by passing some valid CSS to these configuration attributes:
|
||||
|
||||
```yaml
|
||||
# Style of the body element
|
||||
body_style: 'font-size: 20px; font-family: "Merriweather", Georgia, "Times New Roman", Times, serif'
|
||||
|
||||
# Style of the main title
|
||||
title_style: 'margin-top: 30px'
|
||||
|
||||
# Style of the subtitle
|
||||
subtitle_style: 'margin-top: 10px; page-break-after: always'
|
||||
|
||||
# Style of the article titles
|
||||
article_title_style: 'font-size: 1.6em; margin-top: 1em; padding-top: 1em; border-top: 1px solid #999'
|
||||
|
||||
# Style of the article link
|
||||
article_link_style: 'color: #555; text-decoration: none; border-bottom: 1px dotted font-size: 0.8em'
|
||||
|
||||
# Style of the article content
|
||||
article_content_style: 'font-size: 0.8em'
|
||||
```
|
||||
|
||||
The `digest_format` attribute determines the output format of your digest - you may want to choose `html` if you want to
|
||||
deliver a summary of the articles in a newsletter, or pdf if you want instead to deliver the full content of each item
|
||||
as an attachment to an email address. Bonus point: since you can send PDFs to a Kindle if you
|
||||
[configured an email address](https://www.amazon.com/gp/sendtokindle/email),
|
||||
this mechanism allows you to deliver the full digest of your RSS feeds to your Kindle's email address.
|
||||
|
||||
The [`RssUpdates`](https://github.com/BlackLight/platypush/blob/ac02becba80fafce39d5bbcfc682f7a8fe46f529/platypush/backend/http/request/rss/__init__.py#L21)
|
||||
object also provides native integration with the [Mercury Parser API](https://github.com/postlight/mercury-parser-api)
|
||||
to automatically scrape the content of a web page - I covered some of these concepts in
|
||||
my [previous article](https://blog.platypush.tech/article/Deliver-articles-to-your-favourite-e-reader-using-Platypush)
|
||||
on how to parse RSS feeds and send the PDF digest to your e-reader. The same mechanism works well for newsletters too.
|
||||
If you want to parse the content of the newsletter as well, all you have to do is configure
|
||||
the [`http.webpage`](https://platypush.readthedocs.io/en/latest/platypush/plugins/http.webpage.html) Platypush
|
||||
plugin. Since the Mercury API doesn't provide a Python binding, this requires a couple of JavaScript dependencies:
|
||||
|
||||
```shell
|
||||
# Install Node and NPM, e.g. on Debian:
|
||||
apt-get install nodejs npm
|
||||
|
||||
# Install the Mercury Parser API
|
||||
npm install [-g] @postlight/mercury-parser
|
||||
|
||||
# Make sure that the Platypush PDF module dependencies
|
||||
# are installed if you plan HTML->PDF conversion
|
||||
pip install 'platypush[pdf]'
|
||||
```
|
||||
|
||||
Then, if you want to parse the full content of the items and generate a PDF digest out of them, change your `http.poll`
|
||||
configuration to something like this:
|
||||
|
||||
```yaml
|
||||
backend.http.poll:
|
||||
requests:
|
||||
- type: platypush.backend.http.request.rss.RssUpdates
|
||||
url: https://www.technologyreview.com/feed/
|
||||
title: MIT Technology Review
|
||||
poll_seconds: 86400
|
||||
# PDF digest format
|
||||
digest_format: pdf
|
||||
# Extract the full content of the items
|
||||
extract_content: True
|
||||
```
|
||||
|
||||
**WARNING**: Extracting the full content of the articles in an RSS feed has two limitations — a practical one and a
|
||||
legal one:
|
||||
|
||||
- Some websites may require user login before displaying the full content of an article. Some websites perform such
|
||||
checks client-side — and the parser API can usually circumvent them, especially if the full content of an article is
|
||||
actually just hidden behind a client-side paywall. Some websites, however, implement their user checks server-side too
|
||||
before sending the content to the client — and in those cases the parser API may return only a part of the content or
|
||||
no content at all.
|
||||
|
||||
- Always keep in mind that parsing the full content of an article behind a paywall may represent a violation of
|
||||
intellectual property under some jurisdictions, so make sure to do it only for content that is either free or that you
|
||||
have to permission to scrape.
|
||||
|
||||
## Configuring the mail delivery
|
||||
|
||||
When new content is published on a subscribed RSS feed Platypush will generate
|
||||
a [NewFeedEvent](https://platypush.readthedocs.io/en/latest/platypush/events/http.rss.html) and it should create a copy
|
||||
of the digest under `~/.local/share/platypush/feeds/cache/{date:time}_{feed-title}.[html|pdf]`. The `NewFeedEvent` in
|
||||
particular is the link you need to create your custom logic that sends an email to a list of addresses when new content
|
||||
is available.
|
||||
|
||||
First, configure the Platypush mail plugin you prefer. When it comes to sending emails you primarily have two options:
|
||||
|
||||
- The [`mail.smtp`](https://platypush.readthedocs.io/en/latest/platypush/plugins/mail.smtp.html) plugin — if you want to
|
||||
send emails directly through an SMTP server. Platypush configuration:
|
||||
|
||||
```yaml
|
||||
mail.smtp:
|
||||
username: you@gmail.com
|
||||
password: your-pass
|
||||
server: smtp.gmail.com
|
||||
port: 465
|
||||
ssl: True
|
||||
```
|
||||
|
||||
- The [`google.mail`](https://platypush.readthedocs.io/en/latest/platypush/plugins/google.mail.html) plugin — if you
|
||||
want to use the native GMail API to send emails. If that is the case then first make sure that you have the
|
||||
dependencies for the Platypush Google module installed:
|
||||
|
||||
```shell
|
||||
[sudo] pip install 'platypush[google]'
|
||||
```
|
||||
|
||||
In this case you’ll also have to create a project on
|
||||
the [Google Developers console](https://console.developers.google.com/) and download the OAuth credentials:
|
||||
|
||||
- Click on “Credentials” from the context menu > OAuth Client ID.
|
||||
|
||||
- Once generated, you can see your new credentials in the “OAuth 2.0 client IDs” section. Click on the “Download” icon to save them to a JSON file.
|
||||
|
||||
- Copy the file to your Platypush device/server under e.g. `~/.credentials/google/client_secret.json`.
|
||||
|
||||
- Run the following command on the device to authorize the application:
|
||||
|
||||
```shell
|
||||
python -m platypush.plugins.google.credentials \
|
||||
"https://www.googleapis.com/auth/gmail.modify" \
|
||||
~/.credentials/google/client_secret.json \
|
||||
--noauth_local_webserver
|
||||
```
|
||||
|
||||
At this point the GMail delivery is ready to be used by your Platypush automation.
|
||||
|
||||
## Connecting the dots
|
||||
|
||||
Now that both the RSS parsing logic and the mail integration are in place, we can glue them together through the
|
||||
[`NewFeedEvent`](https://platypush.readthedocs.io/en/latest/platypush/events/http.rss.html) event. The new advised way
|
||||
to configure events in Platypush is through native Python scripts - the custom YAML-based syntax for events and
|
||||
procedure was becoming too cumbersome to maintain and write (although it’s still supported), and I feel like going back
|
||||
to a clean and simple Python API may be a better option.
|
||||
|
||||
Create and initialize the Platypush scripts directory, if it doesn’t exist already:
|
||||
|
||||
```shell
|
||||
mkdir -p ~/.config/platypush/scripts
|
||||
cd ~/.config/platypush/scripts
|
||||
|
||||
# Make sure that the scripts module is initialized
|
||||
touch __init__.py
|
||||
```
|
||||
|
||||
Then, create a new hook on `NewFeedEvent`:
|
||||
|
||||
```shell
|
||||
$EDITOR rss_news.py
|
||||
```
|
||||
|
||||
```python
|
||||
import os
|
||||
from typing import List
|
||||
|
||||
from platypush.event.hook import hook
|
||||
from platypush.message.event.http.rss import NewFeedEvent
|
||||
from platypush.utils import run
|
||||
|
||||
# Path to your mailing list - a text file with one address per line
|
||||
maillist = os.path.expanduser('~/.mail.list')
|
||||
|
||||
def get_addresses() -> List[str]:
|
||||
with open(maillist, 'r') as f:
|
||||
return [addr.strip() for addr in f.readlines()
|
||||
if addr.strip() and not addr.strip().startswith('#')]
|
||||
|
||||
|
||||
# This hook matches:
|
||||
# - event_type=NewFeedEvent
|
||||
# - digest_format='html'
|
||||
# - source_title='MIT Technology Review'
|
||||
@hook(NewFeedEvent, digest_format='html', source_title='MIT Technology Review')
|
||||
def send_mit_rss_feed_digest(event: NewFeedEvent, **_):
|
||||
# The digest output file is stored in event.args['digest_filename']
|
||||
with open(event.args['digest_filename'], 'r') as f:
|
||||
run(action='mail.smtp.send',
|
||||
from_='you@yourdomain.com',
|
||||
to=get_addresses(),
|
||||
subject=f'{event.args.get("source_title")} feed digest',
|
||||
body=f.read(),
|
||||
body_type='html')
|
||||
|
||||
# Or, if you opted for the native GMail plugin you may want to go for:
|
||||
|
||||
@hook(NewFeedEvent, digest_format='html', source_title='MIT Technology Review')
|
||||
def send_mit_rss_feed_digest(event: NewFeedEvent, **_):
|
||||
# The digest output file is stored in event.args['digest_filename']
|
||||
with open(event.args['digest_filename'], 'r') as f:
|
||||
run(action='google.mail.compose',
|
||||
sender='you@gmail.com',
|
||||
to=get_addresses(),
|
||||
subject=f'{event.args.get("source_title")} feed digest',
|
||||
body=f.read())
|
||||
|
||||
# If instead you want to send the digest in PDF format as an attachment:
|
||||
|
||||
@hook(NewFeedEvent, digest_format='html', source_title='MIT Technology Review')
|
||||
def send_mit_rss_feed_digest(event: NewFeedEvent, **_):
|
||||
# mail.smtp plugin case
|
||||
run(action='mail.smtp.send',
|
||||
from_='you@yourdomain.com',
|
||||
to=get_addresses(),
|
||||
subject=f'{event.args.get("source_title")} feed digest',
|
||||
body='',
|
||||
attachments=[event.args['digest_filename']])
|
||||
|
||||
# google.mail case
|
||||
run(action='google.mail.compose',
|
||||
sender='you@gmail.com',
|
||||
to=get_addresses(),
|
||||
subject=f'{event.args.get("source_title")} feed digest',
|
||||
body='',
|
||||
files=[event.args['digest_filename']])
|
||||
```
|
||||
|
||||
Finally, create your `~/.mail.list` file with one destination email address per line and start platypush either from the
|
||||
command line or as a service. You should receive your email with the first batch of articles shortly after startup, and
|
||||
you'll receive more items if a new batch is available after the `poll_seconds` configured period.
|
Loading…
Reference in a new issue