Migrated 6th article

This commit is contained in:
Fabio Manganiello 2021-01-28 22:38:06 +01:00
parent 6a4a902dbd
commit d14763d63a
2 changed files with 362 additions and 0 deletions

BIN
static/img/rss-1.jpeg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 78 KiB

View file

@ -0,0 +1,362 @@
[//]: # (title: Deliver articles to your favourite e-reader using Platypush)
[//]: # (description: Leverage the RSS and HTML scraping capabilities of Platypush to set up automations to deliver articles to an e-reader.)
[//]: # (image: /img/rss-1.jpeg)
[//]: # (published: 2019-12-04)
[RSS feeds](https://www.lifewire.com/what-is-an-rss-feed-4684568) are a largely underestimated feature of the web
nowadays — at least outside the circles of geeks. Many apps and paid services exist today to aggregate and curate news
from multiple sources, often delegating the task of selecting articles and order on the screen to an opaque algorithm,
and the world seem to have largely forgotten this two-decade old technology that already solved the problem of news
curation and aggregation a while ago.
However, RSS (or Atom) feeds are much more omnipresent than many think - every single respectable news website provides
at least one feed, albeit some news outlets may not advertise them much amid the fears of losing organic traffic. Feeds
empower users with the possibility of creating their own news feeds and boards through aggregators, without relying on
the mercy of a cloud-run algorithm. And their structured nature (under the hood an RSS feed is just a structured XML)
offers the possibility to build automation pipelines that deliver the content we want wherever we want, whenever we
want, and in whichever format we want.
[IFTTT](https://ifttt.com) is a popular option to build custom logic on RSS feeds. It makes it very intuitive to build
relatively complex rules such as “send me a weekly digest with The Economist articles published in the latest issue” or
“send a telegram message with the digest from the NYT every day at 6 a.m.” or “send a notification to my mobile whenever
XKCD publishes new comics.” However, IFTTT has recently pivoted to become
a [paid service](https://thenextweb.com/apps/2020/09/10/ifttt-introduces-a-paid-plan-reduces-free-usage-to-3-applets/)
with very limited possibility for free users to create new applets.
In my opinion, however, its thanks to internet-connected e-readers, such as the Kindle or MobiScribe, as well as web
services like Mercury and Instapaper that can convert a web page into a clean print-friendly format, that RSS feeds can
finally shine at their full brightness.
Its great to have our news sources neatly organized in an aggregator. Its also nice to have the possibility to
configure push notifications upon the publication of new articles or daily/weekly/monthly digests delivered whenever we
like.
But these features solve only the first part of the problem — the content distribution. The second part of the problem —
content consumption — comes when we click on a link, delivered on whichever device and in whichever format we like, and
we start reading the actual article.
Such an experience nowadays happens mostly on laptop screens or, worse, tiny smartphone screens, where were expected to
hectically scroll through often nonmobile-optimized content filled with ads and paywalls, while a myriad of other
notifications demand for their share of our attention. Reading lengthy content on a smartphone screen is arguably as bad
of an experience as browsing the web on a Kindle is.
Wouldnt it be great if we could get our favorite content automatically delivered to our favorite reading device,
properly formatted and in a comfortably readable size and without all the clutter and distractions? And without having a
backlit screen always in front of our eyes?
In this piece, well see how to do this by using several technological tools (an e-reader, a Kindle account, the Mercury
API, and Instapaper) and how to glue all the pieces together through Platypush.
## Configure your Kindle account for e-mail delivery
Ill assume in this first section you have a Kindle, a linked Amazon account, and a Gmail account that well use to
programmatically send documents to the device via email - although it's also possible to leverage the [`mail.smtp`](https://platypush.readthedocs.io/en/latest/platypush/plugins/mail.smtp.html)
plugin and use another domain for delivering PDFs. Well later see als ohow to leverage Instapaper with other devices.
First, youll have to create an email address associated to your Kindle thatll be used to remotely deliver documents:
- Head to the [Amazon content and device portal](https://amazon.com/mycd), and log in with your Amazon account.
- Click on the second tab (“Your Devices”), and click on the context menu next to the device where your content should
be delivered.
- Youll see the email address associated to your device. Copy it, or click on “Edit” to change it.
- Click on the third tab (“Settings”), and scroll to the bottom to the section titled “Personal Document Settings.”
- Scroll to the bottom to the section named “Approved Personal Document E-mail List” and add your Gmail address as a
trusted source.
To check that everything works, you can now try and send a PDF document to your Kindle from your personal email address.
If the device is connected to WiFi, then the document should automatically download within a few seconds.
## Configure Platypush
Platypush offers all the ingredients we need for the purpose of this piece. We need, in particular, to build an
automation pipeline that:
- Periodically checks a list of RSS sources for new content
- Preprocesses the new items by simplifying the web page (through the Mercury parser or Instapaper) and optionally
exports them to PDF
- Programmatically sends emails to your device(s) with the new content
First, install Platypush with the required extras (any device with any compatible OS will do: a RaspberryPi, an unused
laptop, or a remote server):
```shell
pip install 'platypush[http,pdf,rss,google]'
```
Youll also need to install `npm` and `mercury-parser`. Postlight used to provide a web API for its parser before, but
[theyve discontinued it, choosing to make the project open-source](https://postlight.com/trackchanges/mercury-goes-open-source):
```shell
# Supposing you're on Debian or Debian-derived OS
apt-get install nodejs npm
npm install @postlight/mercury-parser
```
Second, link Platypush to your Gmail account to send documents via email:
- Create a new project on the [Google Developers Console](https://console.developers.google.com/).
- Click on “Credentials” from the context menu > OAuth Client ID.
- Once generated, you can see your new credentials in the “OAuth 2.0 client IDs” section. Click on the “Download” icon
to save them to a JSON file.
- Copy the file to your Platypush device/server under e.g., `~/.credentials/client_secret.json`.
- Run the following command on the device to authorize the application:
```shell
python -m platypush.plugins.google.credentials \
"https://www.googleapis.com/auth/gmail.modify" \
~/.credentials/client_secret.json \
--noauth_local_webserver
```
- Copy the link in your browser; log in with your Google account, if required; and authorize the application.
Now that youve got everything in place, its time to configure Platypush to process your favorite feeds.
## Create a rule to automatically send articles to your Kindle
The [`http.poll`](https://platypush.readthedocs.io/en/latest/platypush/backend/http.poll.html) backend is a flexible
component that can be configured to poll and process updates from many web resources — JSON, RSS, Atom etc.
Suppose you want to check for updates
on [The Daily](https://www.nytimes.com/2018/07/16/podcasts/the-daily/how-do-i-listen-to-the-daily.html) RSS feed twice a
day and deliver a digest with the new content to your Kindle.
Youll want to create a configuration like this in `~/.config/platypush/config.yaml`:
```yaml
backend.http.poll:
requests:
# This poll will handle an RSS feed
- type: platypush.backend.http.request.rss.RssUpdates
# RSS feed URL and title
url: http://feeds.podtrac.com/zKq6WZZLTlbM
title: NYT - The Daily
# How often we want to check for updates
# 12h = 43200 secs
poll_seconds: 43200
# We want to convert content to PDF
digest_format: pdf
# We want to parse and extract the content from
# the web page using Mercury Parser
extract_content: True
```
Create an event hook under `~/.config/platypush/scripts/` that reacts to a `NewFeedEvent` and sends the processed
content to your Kindle via email:
```python
from platypush.event.hook import hook
from platypush.utils import run
from platypush.message.event.http.rss import NewFeedEvent
@hook(NewFeedEvent)
def on_new_feed_digest(event, **context):
run('google.mail.compose',
sender='you@gmail.com',
to='your-kindle@kindle.com',
subject=f'{event.title} feed digest',
body=f'Your {event.title} feed digest delivered to your e-reader',
files=[event.digest_filename])
```
Restart Platypush. As soon as the application finds items in the target feed that havent yet been processed, itll
parse them, convert them to PDF, trigger a `NewFeedEvent` thatll be captured by your hook, and the resulting PDF will
be delivered to your Kindle.
You can add more monitored RSS sources by simply adding more items in the `requests` attribute of the `http.poll`
backend. Now enjoy reading your articles from a proper screen, delivered directly to your e-reader once or twice a day —
tiny smartphone screens, paywalls, pop-ups, and ads feel so much more old-fashioned once you dive into this new
experience.
## Sharing content to your e-reader from your mobile on the fly
RSS feeds are awesome, but they arent the only way we discover and consume content today.
Many times we scroll through our favorite social-media timeline, bump into an interesting article, start reading it on
our tiny screen, and wed like to keep reading it later when we are on a bigger screen.
Several tools and products have spawned to provide a solution to the “parse it, save it, and read it later” problem —
among those [Evernote](https://evernote.com/), [Pocket](https://getpocket.com/),
and [Instapaper](https://instapaper.com/) itself.
Most of them, however, are still affected by the same issue: Either they dont do a good job at actually parsing and
extracting the content in a more readable format (except for Instapaper — Pocket only saves a link to the original
content, while Evernotes content-parsing capabilities have quite some room for improvement, to say the least), or
theyre still bound to the backlit screen of the smartphone or computer that runs them.
Wouldnt it be cool to bump into an interesting article while we scroll our Facebook timeline on our Android device and
with a single click deliver it to our Kindle in a nice and readable format? Lets see how to implement such a rule in
Platypush.
First, well need something that runs on our mobile device to programmatically communicate with the instance of
Platypush installed on our Raspberry/computer/server.
I consider [Tasker](https://tasker.joaoapps.com/) one of the best applications suited for this purpose: with Tasker (and
the other related apps developed by joaoapps), its possible to automate anything on your Android device and create
sophisticated rules that connect it to anything.
There are many ways for Tasker to communicate with Platypush (direct RPC over HTTP calls, using Join with an external
MQTT server to dispatch messages, using an intermediate IFTTT hook, or Pushbullet, etc.), and there are many ways for
Platypush to communicate back to Tasker on your mobile device (using [AutoRemote](https://joaoapps.com/autoremote/) with
the [Platypush plugin](https://platypush.readthedocs.io/en/latest/platypush/plugins/autoremote.html) to send custom
events, using IFTTT with any service connected to your mobile, using the [Join API](https://joaoapps.com/join/api/), or,
again, Pushbullet).
Well use Pushbullet in this piece because it doesnt require as many configuration steps as other techniques.
- Install [Tasker](https://tasker.joaoapps.com/), [AutoShare](https://joaoapps.com/autoshare/),
and [Pushbullet](https://pushbullet.com/) on your Android device.
- Go to your Pushbullet account page, and click “Create Access Token” to create a new access token thatll be used by
Platypush to listen for the messages sent to your account. Enable the Pushbullet plugin and backend on Platypush by
adding these lines to `~/.config/platypush/config.yaml`:
```yaml
backend.pushbullet:
token: YOUR-TOKEN
device: platypush-device
pushbullet:
enabled: True
```
Also add a procedure to `~/.config/platypush/scripts` that, given an URL as input, extracts the content, converts it to
PDF, and sends it to your Kindle:
```python
import re
from platypush.procedure import procedure
from platypush.utils import run
@procedure
def send_web_page_to_kindle(url, **context):
# Some apps don't share only the link, but also some
# text such as "I've found this interesting article
# on XXX". The following action strips out extra content
# from the input and only extracts the URL.
url = re.sub(r"^.*(https?://[^\s]*).*", r"\1", url)
# Extract the content through the Mercury SDK and generate a PDF
outfile = '/tmp/extract.pdf'
response = run('http.webpage.simplfy', url=url, outfile=outfile)
title = response.get('title')
# Rename the file to match the title of the page
if title:
new_outfile = f'/tmp/{response["title"]}.pdf'
run('file.rename', file=outfile, name=new_outfile)
outfile = new_outfile
# Send the file to your Kindle email address
run('google.mail.compose',
sender='you@gmail.com',
to='your-kindle@kindle.com',
subject=f'{title or "[No Title]"} feed digest',
body=f'Original URL: {url}',
files=[outfile])
# Remove the temporary file
run('file.unlink', file=outfile)
```
- Restart Platypush, and check from Pushbullet that your new virtual device, platypush-device in the example above, has
been created.
- On your mobile, open AutoShare, select “Manage Commands,” and create a new command named, for example, *Send to
Kindle*.
- In the task associated with this trigger, tap the plus icon to add a new action, and select “Push a notification” (the
action with the green Pushbullet icon next to it)
- Select “platypush-device” as a target device, and paste the following JSON as message:
```json
{"type":"request", "action":"procedure.send_web_page_to_kindle", "args": {"url":"%astext"}}
```
- In the example above, `%astext` is a special variable in Tasker that contains the text shared by the source app (in
this case, the link sent to AutoShare).
- Open your browser, and go to the web link of an article youd like to send to your Kindle. Select Share > AutoShare
command > Send to Kindle.
- The parsed article should be delivered to your e-reader in an optimized PDF format within seconds.
## Using Instapaper on other Android-based e-readers
Ive briefly mentioned Instapaper already. I really love both the service as well as the app. I consider it somehow an
implementation of what Evernote should have been but has never been.
Just browse to an article on the web, click “Share to Instapaper,” and within one click, that web page will be parsed
into a readable format, with all the clutter and ads removed, and itll be added to your account.
What makes Instapaper really interesting, though, is the fact that its Android app is really minimal (yet extremely well
designed), and it runs well also on devices that run older versions of Android or arent that powerful.
That wouldnt be such a big deal in itself if products like
the [MobiScribe](https://www.indiegogo.com/projects/mobiscribe-the-e-ink-notepad#/) werent slowly hitting the market —
and I hope its example will be followed by others. The MobiScribe can be used both as an e-reader and as an e-ink
notepad, but what really makes it interesting is that it runs Android — even if its
an [ancient Android Kit-Kat modified release](https://goodereader.com/blog/reviews/mobiscribe-e-reader-review-a-great-first-effort)
, a more recent version should arrive sooner or later.
The presence of an Android OS is what makes this e-reader/tablet much more interesting than other similar products -
like [reMarkable](https://remarkable.com/), that has better specs, looks better, costs more, but has opted instead to
use its own OS, limiting the possibilities to run any apps other than those developed by the company itself. Even if
its an old version of Android that runs on an underpowered device, its still possible to install some apps on it — and
Instapaper is one of them.
It makes it very easy to enhance your reading experience: Simply browse the web, add articles to your Instapaper
account, and deliver them on the fly to your e-reader. If you want, you can also use
the [Instapaper API](https://www.instapaper.com/api/simple) in Platypush to programmatically send content to your
Instapaper account instead of your Kindle. Just create a procedure like this:
```python
from platypush.procedure import procedure
from platypush.utils import run
@procedure
def instapaper_add(url, **context):
run('http.request.get', url='https://www.instapaper.com/api/add',
params={
'url': url,
'username': 'your_instapaper_username',
'password': 'your_instapaper_password',
})
```
I know what you're thinking - the idea of sending my credentials for a web service over a GET request give me shiver as
well - but Instapaper has only recently [developed an OAuth-based API](https://www.instapaper.com/api) and I haven't
yet managed to implement it in Platypush.
This procedure is now callable through a simple JSON request:
```json
{"type":"request", "action":"procedure.instapaper_add", "args": {"url":"https://custom-url/article"}}
```
If you prefer this method over the Kindle-over-email way, you can just call this procedure in the examples above to
parse the content of the page and save it to your Instapaper account instead of sending an email to your Kindle address.
## Conclusions
The amount of information and news channels available on the web has increased exponentially in the last years, but the
methods to distribute and consume such content, at least when it comes to flexibility, havent improved much. The
exponential growth of social media and platforms like Google News means a few large companies nowadays decide which
content should appear in front of your eyes, how that content should be delivered to you, and where you can consume it.
Technology should be about creating more opportunities and flexibility, not reducing them, so such a dramatic
centralization shouldnt be acceptable for a power user. Luckily, decades-old technologies like RSS feeds can come to
the rescue, allowing us to tune what we want to read and build automation pipelines that distribute the content wherever
and whenever we like.
Also, e-readers are becoming more and more pervasive, thanks also to the drop in the price of e-ink displays in the last
few years and to more companies and products entering the market. Automating the delivery of web content to e-readers
can really create a new and more comfortable way to stay informed — and helps us find another great use case for our
Kindle, other than downloading novels to read on the beach.