diff --git a/static/img/newsletter-1.png b/static/img/newsletter-1.png new file mode 100644 index 0000000..cb8867e Binary files /dev/null and b/static/img/newsletter-1.png differ diff --git a/static/pages/Deliver-customized-newsletters-from-RSS-feeds-with-Platypush.md b/static/pages/Deliver-customized-newsletters-from-RSS-feeds-with-Platypush.md new file mode 100644 index 0000000..fe2e733 --- /dev/null +++ b/static/pages/Deliver-customized-newsletters-from-RSS-feeds-with-Platypush.md @@ -0,0 +1,315 @@ +[//]: # (title: Deliverd customized newsletters from RSS feeds with Platypush) +[//]: # (description: Use the RSS and email integrations to created automated newsletters.) +[//]: # (image: /img/extension-1.png) +[//]: # (author: Fabio Manganiello ) +[//]: # (published: 2020-09-06) + +I’ve always been a supporter of well-curated newsletters. They give me an opportunity to get a good overview of what +happened in the fields I follow within a span of a day, a week or a month. However, not all the newsletters fit this +category. Some don’t think three times before selling email addresses to 3rd-parties — and within the blink of an eye +your mailbox can easily get flooded with messages that you didn’t request. Others may sign up your address for other +services or newsletters as well, and often they don’t often much granularity to configure which communications you want +to receive. Even in the best-case scenario, the most privacy-savvy user may still think twice before signing up for a +newsletter — you’re giving your personal email address to someone else you don’t necessarily trust, implying “yes, this +is my address and I’m interested in this subject”. Additionally, most of the newsletters spice up their URLs with +tracking parameters, so they can easily measure user engagement — something you may not necessarily be happy with. +Moreover, the customization junkie may also have a valid use case for a more finely tuned selection of content in his +newsletter — you may want to group some sources together into the same daily/weekly email, or you may be interested only +in some particular subset of the subjects covered by a newsletter, filtering out those that aren’t relevant, or +customize the style of the digest that gets delivered. Finally, a fully automated way to deliver newsletters through 5 +lines of code and the tuning of a couple of parameters is the nirvana for many companies of every size out there. + +## Feed up the newsletter + +Those who read my articles in the past may know that I’m an avid consumer of RSS feeds. Despite being a 21-year-old +technology, they do their job very well when it comes to deliver the information that matters without all the noise and +trackers, and they provide a very high level of integration being simple XML documents. However, in spite of all the +effort I put to be up-to-date with all my sources, a lot of potentially interesting content inevitably slips through — +and that’s where newsletters step in, as they filter and group together all the content that was generated in a given +time frame and periodically deliver it to your inbox. + +My ideal solution would be something that combines the best aspects of both the worlds: the flexibility of an RSS +subscription, combined with a flexible way of filtering and aggregating content and sources, and get the full package +delivered at my door in whichever format I like (HTML, PDF, MOBI…). In this article I’m going to show how to achieve +this goal with a few tools: + +- One or more sources that you want to track and that support RSS feeds (in this example I’ll use the [MIT Technology + Review RSS feed](https://www.technologyreview.com/feed/), but the procedure works for any RSS feed). + +- An email address. + +- [Platypush](https://git.platypush.tech/platypush/platypush) to do the heavy-lifting job — monitor the RSS sources at + custom intervals, trigger events when a source has some new content, create a digest out of the new content, and + deliver the full package to a list of email addresses. + +Let’s cover these points step by step. + +## Installing and configuring Platypush + +We’ll be using the [`http.poll`](https://platypush.readthedocs.io/en/latest/platypush/backend/http.poll.html) backend +configured with one or more `RssUpdate` objects to poll our RSS sources at regular intervals and create the digests, and +either the [`mail.smtp`](https://platypush.readthedocs.io/en/latest/platypush/plugins/mail.smtp.html) plugin or the +[`google.mail`](https://platypush.readthedocs.io/en/latest/platypush/plugins/google.mail.html) plugin to send the +digests to our email. + +You can install Platypush on any device where you want to run your logic — a RaspberryPi, an old laptop, a cloud node, +and so on. We will install the base package with the `rss` module. Optionally, you can install it with the `pdf` module +as well (if you want to export your digests also to PDF) or the `google` module (if you want to send the newsletter from +a GMail address instead of an SMTP server). + +The first option is to install the latest stable version through `pip`: + +```shell +[sudo] pip install 'platypush[rss,pdf,google]' +``` + +The other option is to install the latest git version: + +```shell +git clone https://git.platypush.tech/platypush/platypush +cd platypush +[sudo] pip install '.[rss,pdf,google]' +``` + +## Monitoring your RSS feeds + +Once the software is installed, create the configuration file `~/.config/platypush/config.yaml` if it doesn't exist +already and add the configuration for the RSS monitor: + +```yaml +# Generic HTTP endpoint monitor +backend.http.poll: + requests: + # Add a new RSS feed to the pool + - type: platypush.backend.http.request.rss.RssUpdates + # URL to the RSS feed + url: https://www.technologyreview.com/feed/ + # Title of the feed (shown in the head of the digest) + title: MIT Technology Review + # How often we should monitor this source (24*60*60 secs = once a day) + poll_seconds: 86400 + # Format of the digest (HTML or PDF) + digest_format: html +``` + +You can also add more sources to the `http.poll` `requests` object, each with its own configuration. Also, you can +customize the style of your digest by passing some valid CSS to these configuration attributes: + +```yaml +# Style of the body element +body_style: 'font-size: 20px; font-family: "Merriweather", Georgia, "Times New Roman", Times, serif' + +# Style of the main title +title_style: 'margin-top: 30px' + +# Style of the subtitle +subtitle_style: 'margin-top: 10px; page-break-after: always' + +# Style of the article titles +article_title_style: 'font-size: 1.6em; margin-top: 1em; padding-top: 1em; border-top: 1px solid #999' + +# Style of the article link +article_link_style: 'color: #555; text-decoration: none; border-bottom: 1px dotted font-size: 0.8em' + +# Style of the article content +article_content_style: 'font-size: 0.8em' +``` + +The `digest_format` attribute determines the output format of your digest - you may want to choose `html` if you want to +deliver a summary of the articles in a newsletter, or pdf if you want instead to deliver the full content of each item +as an attachment to an email address. Bonus point: since you can send PDFs to a Kindle if you +[configured an email address](https://www.amazon.com/gp/sendtokindle/email), +this mechanism allows you to deliver the full digest of your RSS feeds to your Kindle's email address. + +The [`RssUpdates`](https://github.com/BlackLight/platypush/blob/ac02becba80fafce39d5bbcfc682f7a8fe46f529/platypush/backend/http/request/rss/__init__.py#L21) +object also provides native integration with the [Mercury Parser API](https://github.com/postlight/mercury-parser-api) +to automatically scrape the content of a web page - I covered some of these concepts in +my [previous article](https://blog.platypush.tech/article/Deliver-articles-to-your-favourite-e-reader-using-Platypush) +on how to parse RSS feeds and send the PDF digest to your e-reader. The same mechanism works well for newsletters too. +If you want to parse the content of the newsletter as well, all you have to do is configure +the [`http.webpage`](https://platypush.readthedocs.io/en/latest/platypush/plugins/http.webpage.html) Platypush +plugin. Since the Mercury API doesn't provide a Python binding, this requires a couple of JavaScript dependencies: + +```shell +# Install Node and NPM, e.g. on Debian: +apt-get install nodejs npm + +# Install the Mercury Parser API +npm install [-g] @postlight/mercury-parser + +# Make sure that the Platypush PDF module dependencies +# are installed if you plan HTML->PDF conversion +pip install 'platypush[pdf]' +``` + +Then, if you want to parse the full content of the items and generate a PDF digest out of them, change your `http.poll` +configuration to something like this: + +```yaml +backend.http.poll: + requests: + - type: platypush.backend.http.request.rss.RssUpdates + url: https://www.technologyreview.com/feed/ + title: MIT Technology Review + poll_seconds: 86400 + # PDF digest format + digest_format: pdf + # Extract the full content of the items + extract_content: True +``` + +**WARNING**: Extracting the full content of the articles in an RSS feed has two limitations — a practical one and a +legal one: + +- Some websites may require user login before displaying the full content of an article. Some websites perform such + checks client-side — and the parser API can usually circumvent them, especially if the full content of an article is + actually just hidden behind a client-side paywall. Some websites, however, implement their user checks server-side too + before sending the content to the client — and in those cases the parser API may return only a part of the content or + no content at all. + +- Always keep in mind that parsing the full content of an article behind a paywall may represent a violation of + intellectual property under some jurisdictions, so make sure to do it only for content that is either free or that you + have to permission to scrape. + +## Configuring the mail delivery + +When new content is published on a subscribed RSS feed Platypush will generate +a [NewFeedEvent](https://platypush.readthedocs.io/en/latest/platypush/events/http.rss.html) and it should create a copy +of the digest under `~/.local/share/platypush/feeds/cache/{date:time}_{feed-title}.[html|pdf]`. The `NewFeedEvent` in +particular is the link you need to create your custom logic that sends an email to a list of addresses when new content +is available. + +First, configure the Platypush mail plugin you prefer. When it comes to sending emails you primarily have two options: + +- The [`mail.smtp`](https://platypush.readthedocs.io/en/latest/platypush/plugins/mail.smtp.html) plugin — if you want to + send emails directly through an SMTP server. Platypush configuration: + +```yaml +mail.smtp: + username: you@gmail.com + password: your-pass + server: smtp.gmail.com + port: 465 + ssl: True +``` + +- The [`google.mail`](https://platypush.readthedocs.io/en/latest/platypush/plugins/google.mail.html) plugin — if you + want to use the native GMail API to send emails. If that is the case then first make sure that you have the + dependencies for the Platypush Google module installed: + +```shell +[sudo] pip install 'platypush[google]' +``` + +In this case you’ll also have to create a project on +the [Google Developers console](https://console.developers.google.com/) and download the OAuth credentials: + +- Click on “Credentials” from the context menu > OAuth Client ID. + +- Once generated, you can see your new credentials in the “OAuth 2.0 client IDs” section. Click on the “Download” icon to save them to a JSON file. + +- Copy the file to your Platypush device/server under e.g. `~/.credentials/google/client_secret.json`. + +- Run the following command on the device to authorize the application: + +```shell +python -m platypush.plugins.google.credentials \ + "https://www.googleapis.com/auth/gmail.modify" \ + ~/.credentials/google/client_secret.json \ + --noauth_local_webserver +``` + +At this point the GMail delivery is ready to be used by your Platypush automation. + +## Connecting the dots + +Now that both the RSS parsing logic and the mail integration are in place, we can glue them together through the +[`NewFeedEvent`](https://platypush.readthedocs.io/en/latest/platypush/events/http.rss.html) event. The new advised way +to configure events in Platypush is through native Python scripts - the custom YAML-based syntax for events and +procedure was becoming too cumbersome to maintain and write (although it’s still supported), and I feel like going back +to a clean and simple Python API may be a better option. + +Create and initialize the Platypush scripts directory, if it doesn’t exist already: + +```shell +mkdir -p ~/.config/platypush/scripts +cd ~/.config/platypush/scripts + +# Make sure that the scripts module is initialized +touch __init__.py +``` + +Then, create a new hook on `NewFeedEvent`: + +```shell +$EDITOR rss_news.py +``` + +```python +import os +from typing import List + +from platypush.event.hook import hook +from platypush.message.event.http.rss import NewFeedEvent +from platypush.utils import run + +# Path to your mailing list - a text file with one address per line +maillist = os.path.expanduser('~/.mail.list') + +def get_addresses() -> List[str]: + with open(maillist, 'r') as f: + return [addr.strip() for addr in f.readlines() + if addr.strip() and not addr.strip().startswith('#')] + + +# This hook matches: +# - event_type=NewFeedEvent +# - digest_format='html' +# - source_title='MIT Technology Review' +@hook(NewFeedEvent, digest_format='html', source_title='MIT Technology Review') +def send_mit_rss_feed_digest(event: NewFeedEvent, **_): + # The digest output file is stored in event.args['digest_filename'] + with open(event.args['digest_filename'], 'r') as f: + run(action='mail.smtp.send', + from_='you@yourdomain.com', + to=get_addresses(), + subject=f'{event.args.get("source_title")} feed digest', + body=f.read(), + body_type='html') + +# Or, if you opted for the native GMail plugin you may want to go for: + +@hook(NewFeedEvent, digest_format='html', source_title='MIT Technology Review') +def send_mit_rss_feed_digest(event: NewFeedEvent, **_): + # The digest output file is stored in event.args['digest_filename'] + with open(event.args['digest_filename'], 'r') as f: + run(action='google.mail.compose', + sender='you@gmail.com', + to=get_addresses(), + subject=f'{event.args.get("source_title")} feed digest', + body=f.read()) + +# If instead you want to send the digest in PDF format as an attachment: + +@hook(NewFeedEvent, digest_format='html', source_title='MIT Technology Review') +def send_mit_rss_feed_digest(event: NewFeedEvent, **_): + # mail.smtp plugin case + run(action='mail.smtp.send', + from_='you@yourdomain.com', + to=get_addresses(), + subject=f'{event.args.get("source_title")} feed digest', + body='', + attachments=[event.args['digest_filename']]) + + # google.mail case + run(action='google.mail.compose', + sender='you@gmail.com', + to=get_addresses(), + subject=f'{event.args.get("source_title")} feed digest', + body='', + files=[event.args['digest_filename']]) +``` + +Finally, create your `~/.mail.list` file with one destination email address per line and start platypush either from the +command line or as a service. You should receive your email with the first batch of articles shortly after startup, and +you'll receive more items if a new batch is available after the `poll_seconds` configured period.