Finalized article

2025-06-05 22:15:29 +02:00 · 2025-06-05 22:15:29 +02:00 · b229ccebaa
commit b229ccebaa
parent f45e64e933
2 changed files with 759 additions and 505 deletions
--- a/markdown/Building-a-better-digital-reading-experience.md
+++ b/markdown/Building-a-better-digital-reading-experience.md
@ -0,0 +1,759 @@
 [//]: # (title: Building a better digital reading experience)
 [//]: # (description: Bypass client-side restrictions on news and blog articles, archive them and read them on any offline reader)
 [//]: # (image: https://s3.platypush.tech/static/images/reading-experience.jpg)
 [//]: # (author: Fabio Manganiello <fabio@manganiello.tech>)
 [//]: # (published: 2025-06-05)
 I've always been an avid book reader as a kid.
 I liked the smell of the paper, the feeling of turning the pages, and the
 ability to read them anywhere I wanted, as well as lend them to friends and
 later share our reading experiences.
 As I grew and chose a career in tech and a digital-savvy lifestyle, I started
 to shift my consumption from the paper to the screen. But I *still* wanted the
 same feeling of a paper book, the same freedom of reading wherever I wanted
 without distractions, and without being constantly watched by someone who will
 recommend me other products based on what I read or how I read.
 I was an early support of the Amazon Kindle idea, I quickly moved most of my
 physical books to the Kindle, I became a vocal supported of online magazines
 that also provided Kindle subscriptions, and I started to read more and more on
 e-ink devices.
 Then I noticed that, after an initial spike, not many magazines and blogs
 provided Kindle subscriptions or EPub versions of their articles.
 So nevermind - I started tinkering my way out of it and [wrote an article in
 2019](https://blog.platypush.tech/article/Deliver-articles-to-your-favourite-e-reader-using-Platypush)
 on how to use [Platypush](https://platypush.tech) with its
 [`rss`](https://docs.platypush.tech/platypush/plugins/rss.html),
 [`instapaper`](https://docs.platypush.tech/platypush/plugins/instapaper.html) and
 [`gmail`](https://docs.platypush.tech/platypush/plugins/google.mail.html)
 plugins to subscribe to RSS feeds, parse new articles, convert them to PDF and
 deliver them to my Kindle.
 Later I moved from Kindle to the first version of the
 [Mobiscribe](https://www.mobiscribe.com), as Amazon started to be more and more
 restrictive in its option to import and export stuff out of the Kindle. Using
 Calibre and some DRM removal tools to export articles or books I had regularly
 purchased was gradually getting more cumbersome and error-prone, and the
 Mobiscribe at that time was an interesting option because it offered a decent
 e-ink device, for a decent price, and it ran Android (an ancient version, but
 at least one that was sufficient to run [Instapaper](https://instapaper.com)
 and [KOReader](https://koreader.rocks)).
 That simplified things a bit because I didn't need intermediary delivery via
 email to get stuff on my Kindle or Calibre to try and pull things out of it. I
 was using Instapaper on all of my devices, included the Mobiscribe, I could
 easily scrape and push articles to it through Platypush, and I could easily
 keep track of my reading state across multiple devices.
 Good things aren't supposed to last though.
 Instapaper started to feel quite limited in its capabilities, and I didn't like
 the idea of a centralized server holding all of my saved articles. So I've
 moved to a self-hosted [Wallabag](https://wallabag.org) instance in the
 meantime - which isn't perfect, but provides a lot more customization and
 control.
 Moreover, more and more sites started implementing client-side restrictions for
 my scrapers - Instapaper was initially more affected, as it was much easier for
 publisher's websites to detect scraping requests coming from the same subnet,
 but slowly Wallabag too started bumping into Cloudflare screens, CAPTCHAs and
 paywalls.
 So the Internet Archive provided some temporary relief - I could still archive
 articles there, and then instruct my Wallabag instance to read them from the
 archived link.
 Except that, in the past few months, the Internet Archive has also started
 implementing anti-scraping features, and you'll most likely get a Cloudflare
 screen if you try and access an article from an external scraper.
 ## A little ethical note before continuing
 _Feel free to skip this part and go to the technical setup section if you
 already agree that, if buying isn't owning, then piracy isn't stealing._
 I _do not_ condone nor support piracy.
 I mean, sometimes I do, but being a creator myself I always try to make sure
 that, if piracy is the only way to freely access content wherever I want, then
 creators are not being harmed.
 I don't mind however harming any intermediaries that add friction to the
 process just to have a piece of the pie, stubbornly rely on unsustainable
 business models that sacrifices both the revenue of the authors and the privacy
 and freedom of the readers, and prevent me from having a raw file that I can
 download and read wherever I want though. It's because of those folks that the
 digital reading experience, despite all the initial promises, has become much
 worse than reading physical books and magazines. So I don't see a big moral
 conundrum in pirating to harm those folks and get back my basic freedoms as a
 reader.
 But I do support creators via Patreon. I pay for subscriptions to digital
 magazines that I will anyway never read through their official mobile app.
 Every now and then I buy physical books and magazines that I've already read
 and that I've really enjoyed, to support the authors, just like I still buy
 some vinyls of albums I really love even though I could just stream them. And I
 send one-off donations when I find that some content was particularly useful to
 me. And I'd probably support content creators even more if only they allowed me
 to pay only for the digital content I want to read, if only there was a viable
 digital business model also for the occasional reader, instead of everybody
 trying to lock me into a Hotel California subscription ("_you can check out any
 time you like, but you can never leave_") just because their business managers
 are those folks who have learned how to use the hammer of the recurring
 revenue, and think that every problem in the world is a subscription nail to be
 hit on its head.
 I also think that the current business model that runs most of the high-quality
 content available online (locking people into apps and subscriptions in order
 to view the content) is detrimental for the distribution of knowledge in what's
 supposed to be the age of information. If I want to be exposed to diverse
 opinions on what's going on in different industries or different parts of the
 world, I'd probably need at least a dozen subscriptions, while in earlier
 generations folks could just walk into their local library or buy a single
 issue of a newspaper every now and then.
 If we have no digital alternatives for such simple and established ways to
 access and spread knowledge, then piracy is almost a civic duty. It can't be
 that high quality reports or insightful blog articles are locked behind
 paywalls, subscriptions and apps, and all that's left for free is cheap
 disinformation on social media. Future historians will have a very hard time
 deciphering what was going on in the world in the 2020s, because most of the
 content that was available online is now locked behind paywalls, the companies
 that ran those sites and built the apps may be long gone, and if publishers
 keep waging war against folks like the Internet Archive, then they may start
 looking at our age like some kind of strange digital dark age.
 I also think that it's my right, as a reader, to be able to consume content on
 a medium without distractions - like social media buttons, ads, comments, or
 other stuff that distracts me from the main content, and if the publisher
 doesn't provide me with a solution for that, and I have already paid for the
 content, then I should be able to build a solution myself. Even in an age where
 attention is the new currency, at least we should not try to grab people's
 attention when they're trying to read some dense content - that's just common
 sense.
 And I also demand the right to access the content I've paid for however I want.
 Do I want to export everything to Markdown or read it in ASCII art in a
 terminal? Do I want to export it to EPub so I can read it on my e-ink device?
 Do I want to export it to PDF and email it to one of my students for a research
 project, or to myself for later reference? Do I want to access it without
 having to use their tracker-ridden mobile app, or without being forced to see
 ads despite having paid for a subscription? Well, that's my business. I firmly
 believe that it's not an author's or publisher's right to dictate how I access
 the content after paying for it. Just like in earlier days nobody minded if,
 after purchasing a book, I would share it with my kids, or lend it to a friend,
 or scan it and read it on my computer, or make the copies of a few pages to
 bring to my students or my colleagues for a project, or leave it on a bench at
 the park or in a public bookshelf after reading it.
 If some freedoms were legally granted to me before, and now they've been taken
 away, then it's not piracy if I keep demanding those freedoms.
 And content ownership is another problem. I'll no longer be able to access
 content I've read during my subscription period once my subscription expires.
 I'll not be able to pass on the books or magazines I've read in my lifetime to
 my kid. I'll never be able to lend it to someone else, just like I would leave
 a book I had read on a public bookshelf or a bench at the park for someone
 else to read it.
 In other words, buying now grants you a temporary license to access the content
 on someone else's device - you don't really own anything.
 So, if buying isn't owning, piracy isn't stealing.
 And again, to make it very clear, I'll be referring to *personal use* in this
 article. The case where you support creators through other means, but the
 distribution channel and the  business models are the problem, and you just
 want your basic freedoms as a content consumer back.
 If however you want to share scraped articles on the Web, or even worse profit
 from access to it, then you're *really* doing the kind of piracy I can't
 condone.
 With this out of the way, let's get our hands dirty.
 ## The setup
 A high-level overview of the setup is as follows:
 <img alt="High-level overview of the scraper setup" src="https://s3.platypush.tech/static/images/wallabag-scraper-architecture.png" width="650px">
 Let's break down the building blocks of this setup:
 - **[Redirector](https://addons.mozilla.org/en-US/firefox/addon/redirector/)**
  is a browser extension that allows you to redirect URLs based on custom
  rules as soon as the page is loaded. This is useful to redirect paywalled
  resources to the Internet Archive, which usually stores full copies of the
  content. Even if you regularly paid for a subscription to a magazine, and you
  can read the article on the publisher's site or from their app, your Wallabag
  scraper will still be blocked if the site implements client-side restrictions
  or is protected by Cloudflare. So you need to redirect the URL to the Internet
  Archive, which will then return a copy of the article that you can scrape.
 - **[Platypush](https://platypush.tech)** is a Python-based general-purpose
  platform for automation that I've devoted a good chunk of the past decade
  to develop. It allows you to run actions, react to events and control devices
  and services through a unified API and Web interface, and it comes with
  [hundreds of supported integrations](https://docs.platypush.tech). We'll use
  the [`wallabag`](https://docs.platypush.tech/platypush/plugins/wallabag.html)
  plugin to push articles to your Wallabag instance, and optionally the
  [`rss`](https://docs.platypush.tech/platypush/plugins/rss.html) plugin if you
  want to programmatically subscribe to RSS feeds, scrape articles and archive
  them to Wallabag, and the
  [`ntfy`](https://docs.platypush.tech/platypush/plugins/ntfy.html) plugin to
  optionally send notifications to your mobile device when new articles are
  available.
 - **[Platypush Web extension](https://addons.mozilla.org/en-US/firefox/addon/platypush/)**
  is a browser extension that allows you to interact with Platypush from your
  browser, and it also provides a powerful JavaScript API that you can leverage
  to manipulate the DOM and automate tasks in the browser. It's like a
  [Greasemonkey](https://addons.mozilla.org/en-US/firefox/addon/greasemonkey/)
  or [Tampermonkey](https://addons.mozilla.org/en-US/firefox/addon/tampermonkey/)
  extension that allows you to write scripts to customize your browser
  experience, but it also allows you to interact with Platypush and leverage
  its backend capabilities. On top of that, I've also added built-in support
  for the [Mercury Parser API](https://github.com/usr42/mercury-parser) in it,
  so you can easily distill articles - similar to what Firefox does with its
  [Reader
  Mode](https://support.mozilla.org/en-US/kb/firefox-reader-view-clutter-free-web-pages),
  but in this case you can customize the layout and modify the original DOM
  directly, and the distilled content can easily be dispatched to any other
  service or application. We'll use it to:
  - Distill the article content from the page, removing all the
    unnecessary elements (ads, comments, etc.) and leaving only the main text
    and images.
  - Archive the distilled article to Wallabag, so you can read it later
    from any device that has access to your Wallabag instance.
 - **[Wallabag](https://wallabag.org)** is a self-hosted read-it-later
  service that allows you to save articles from the Web and read them later,
  even offline. It resembles the features of the ([recently
  defunct](https://support.mozilla.org/en-US/kb/future-of-pocket))
  [Pocket](https://getpocket.com/home). It provides a Web interface, mobile
  apps and browser extensions to access your saved articles, and it can also be
  used as a backend for scraping articles from the Web.
 - (_Optional_) **[KOReader](https://koreader.rocks)** is an
  open-source e-book reader that runs on a variety of devices, including any
  e-ink readers that run Android (and even the
  [Remarkable](https://github.com/koreader/koreader/wiki/Installation-on-Remarkable)).
  It has a quite minimal interface and it may take a while to get used to, but
  it's extremely powerful and customizable. I personally prefer it over the
  official Wallabag app - it has a native Wallabag integration, as well as OPDS
  integration to synchronize with my
  [Ubooquity](https://docs.linuxserver.io/images/docker-ubooquity/) server,
  synchronization of highlights and notes to Nextcloud Notes, WebDAV support
  (so you can access anything hosted on e.g. your Nextcloud instance), progress
  sync across devices through their [sync
  server](https://github.com/koreader/koreader-sync-server), and much more. It
  basically gives you a single app to access your saved articles, your books,
  your notes, your highlights, and your documents.
 - (_Optional_) An Android-based e-book reader to run KOReader on. I have
  recently switched from my old Mobiscribe to an [Onyx BOOX Note Air
  4](https://www.onyxbooxusa.com/onyx-boox-note-air4-c) and I love it. It's
  powerful, the display is great, it runs basically any Android app out there
  (and I've had no issues with running any apps installed through
  [F-Droid](https://f-droid.org)), and it also has a good set of stock apps,
  and most of them support WebDAV synchronization - ideal if you have a
  [Nextcloud](https://nextcloud.com) instance to store your documents and
  archived links.
 **NOTE**: The Platypush extension only works with Firefox, on any Firefox-based
 browser, or on any browser out there that still supports the [Manifest
 V2](https://blog.mozilla.org/addons/2024/03/13/manifest-v3-manifest-v2-march-2024-update/).
 The Manifest V3 has been a disgrace that Google has forced all browser
 extension developers to swallow. I won't go in detail here, but the Platypush
 extension needs to be able to perform actions (such as calls to custom remote
 endpoints and runtime interception of HTTP headers) that are either no longer
 supported on Manifest V3, or that are only supported through laborious
 workarounds (such as using the declarative Net Request API to explicitly
 define what you want to intercept and what remote endpoints you want to call).
 **NOTE 2**: As of June 2025, the Platypush extension is only supported on
 Firefox for desktop. A Firefox for Android version [is
 work in progress](https://git.platypush.tech/platypush/platypush-webext/issues/1).
 Let's dig deeper into the individual components of this setup.
 ## Redirector
 ![Redirector extension screenshot](https://s3.platypush.tech/static/screenshots/Redirector.png)
 This is a nice addition if you want to automatically view some links through
 the Internet Archive rather than the original site.
 You can install it from the [Firefox Add-ons site](https://addons.mozilla.org/en-US/firefox/addon/redirector/).
 Once installed, you can create a bunch of rules (regular expressions are supported)
 to redirect URLs from paywalled domains that you visit often to the Internet Archive.
 For example, this regular expression:
 ```
 ^(https://([\w-]+).substack.com/p/.*)
 ```
 will match any Substack article URL, and you can redirect it to the Internet Archive
 through this URL:
 ```
 https://archive.is/$1
 ```
 Next time you open a Substack article, it will be automatically redirected to its
 most recent archived version - or it will prompt you to archive the URL if it's not
 been archived yet.
 ## Wallabag
 ![Wallabag screenshot](https://s3.platypush.tech/static/screenshots/wallabag-1.png)
 Wallabag can easily be installed on any server [through Docker](https://doc.wallabag.org/developer/docker/).
 Follow the documentation for the set up of your user and create an API token
 from the Web interface.
 It's also advised to [set up a reverse
 proxy](https://doc.wallabag.org/admin/installation/virtualhosts/#configuration-on-nginx)
 in front of Wallabag, so you can easily access it over HTTPS.
 Once configured the reverse proxy, you can generate a certificate for it - for
 example, if you use [`certbot`](https://certbot.eff.org/) and `nginx`:
 ```bash
 ❯ certbot --nginx -d your-domain.com
 ```
 Then you can access your Wallabag instance at `https://your-domain.com` and log
 in with the user you created.
 Bonus: I personally find the Web interface of Wallabag quite ugly - the
 fluorescent light blue headers are distracting and the default font and column
 width isn't ideal for my taste. So I made a [Greasemonkey/Tampermonkey
 script](https://gist.manganiello.tech/fabio/ec9e28170988441d9a091b3fa6535038)
 to make it better if you want (see screenshot above).
 ## [_Optional_] ntfy
 [ntfy](https://ntfy.sh) is a simple HTTP-based pub/sub notification service
 that you can use to send notifications to your devices or your browser. It
 provides both an [Android app](https://f-droid.org/en/packages/io.heckel.ntfy/)
 and a [browser
 addon](https://addons.mozilla.org/en-US/firefox/addon/send-to-ntfy/) to send
 and receive notifications, allowing you to open saved links directly on your
 phone or any other device subscribed to the same topic.
 Running it via docker-compose [is quite
 straightforward](https://github.com/binwiederhier/ntfy/blob/main/docker-compose.yml).
 It's also advised to serve it behind a reverse proxy with HTTPS support,
 keeping in mind to set the right header for the Websocket paths - example nginx
 configuration:
 ```nginx
 map $http_upgrade $connection_upgrade {
    default upgrade;
    '' close;
 }
 server {
  server_name notify.example.com;
  location / {
      proxy_pass http://your-internal-ntfy-host:port;
      client_max_body_size 5M;
      proxy_read_timeout 60;
      proxy_connect_timeout 60;
      proxy_redirect off;
      proxy_set_header Host $http_host;
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header X-Forwarded-Ssl on;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  }
  location ~ .*/ws/?$ {
      proxy_http_version 1.1;
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection $connection_upgrade;
      proxy_set_header Host $http_host;
      proxy_pass http://your-internal-ntfy-host:port;
  }
 }
 ```
 Once the server is running, you can check the connectivity by opening your
 server's main page in your browser.
 **NOTE**: Be _careful_ when choosing your ntfy topic name, especially if you
 are using a public instance. ntfy by default doesn't require any authentication
 for publishing or subscribing to a topic. So choose a random name (or at least
 a random prefix/suffix) for your topics and treat them like a password.
 ## Platypush
 Create a new virtual environment and install Platypush through `pip` (the
 plugins we'll use in the first part don't require any additional dependencies):
 ```bash
 ❯ python3 -m venv venv
 ❯ source venv/bin/activate
 ❯ pip install platypush
 ```
 Then create a new configuration file `~/.config/platypush/config.yaml` with the
 following configuration:
 ```yaml
 # Web server configuration
 backend.http:
  # port: 8008
 # Wallabag configuration
 wallabag:
  server_url: https://your-domain.com
  client_id: your_client_id
  client_secret: your_client_secret
  # Your Wallabag user credentials are required for the first login.
  # It's also advised to keep them here afterwards so the refresh
  # token can be automatically updated.
  username: your_username
  password: your_password
 ```
 Then you can start the service with:
 ```bash
 ❯ platypush
 ```
 You can also create a systemd service to run Platypush in the background:
 ```bash
 ❯ mkdir -p ~/.config/systemd/user
 ❯ cat <<EOF > ~/.config/systemd/user/platypush.service
 [Unit]
 Description=Platypush service
 After=network.target
 [Service]
 ExecStart=/path/to/venv/bin/platypush
 Restart=always
 RestartSec=5
 EOF
 ❯ systemctl --user daemon-reload
 ❯ systemctl --user enable --now platypush.service
 ```
 After starting the service, head over to `http://your_platypush_host:8008` (or
 the port you configured in the `backend.http` section) and create a new user
 account.
 It's also advised to serve the Platypush Web server behind a reverse proxy with
 HTTPS support if you want it to easily be accessible from the browser extension -
 a basic `nginx` configuration [is available on the
 repo](https://git.platypush.tech/platypush/platypush/src/branch/master/examples/nginx/nginx.sample.conf).
 ## Platypush Web extension
 You can install the Platypush Web extension from the [Firefox Add-ons
 site](https://addons.mozilla.org/en-US/firefox/addon/platypush/).
 After installing it, click on the extension popup and add the URL of your
 Platypush Web server.
 ![Platypush Web extension - authentication
 screenshot](https://s3.platypush.tech/static/screenshots/webext/add-device.png)
 When successfully connected, you should see the device in the main menu, you
 can run commands on it and save actions.
 A good place to start familiarizing with the Platypush API is the _Run Action_
 dialog, which allows you to run commands on your server and provides
 autocomplete for the available actions, as well as documentation about their
 arguments.
 ![Platypush Web extension - run dialog](https://s3.platypush.tech/static/screenshots/webext/run-2.png)
 The default action mode is _Request_ (i.e. single requests against the API).
 You can also pack together more actions on the backend [into
 _procedures_](https://docs.platypush.tech/wiki/Quickstart.html#greet-me-with-lights-and-music-when-i-come-home),
 which can be written either in the YAML config or as Python scripts (by default
 loaded from `~/.config/platypush/scripts`). If correctly configured, procedures
 will be available in the _Run Action_ dialog.
 The other mode, which we'll use in this article, is _Script_. In this mode you
 can write custom JavaScript code that can interact with your browser.
 ![Platypush Web extension - script
 mode](https://platypush-static.s3.nl-ams.scw.cloud/screenshots/webext/script.png)
 [Here](https://gist.github.com/BlackLight/d80c571705215924abc06a80994fd5f4) is
 a sample script that you can use as a reference for the API exposed by the
 extension. Some examples include:
 - `app.run`, to run an action on the Platypush backend
 - `app.getURL`, `app.setURL` and `app.openTab` to get and set the current URL,
  or open a new tab with a given URL
 - `app.axios.get`, `app.axios.post` etc. to perform HTTP requests to other
  external services through the Axios library
 - `app.getDOM` and `app.setDOM` to get and set the current page DOM
 - `app.mercury.parse` to distill the current page content using the Mercury
  Parser API
 ### Reader Mode script
 We can put together the building blocks above to create our first script, which
 will distill the current page content and swap the current page DOM with the
 simplified content - with no ads, comments, or other distracting visual
 elements. The full content of the script is available
 [here](https://gist.manganiello.tech/fabio/c731b57ff6b24d21a8f43fbedde3dc30).
 This is akin to what Firefox' [Reader
 Mode](https://support.mozilla.org/en-US/kb/firefox-reader-view-clutter-free-web-pages)
 does, but with much more room for customization.
 Note that for this specific script we don't need any interactions with the
 Platypush backend. Everything happens on the client, as the Mercury API is
 built into the Platypush Web extension.
 Switch to _Script_ mode in the _Run Action_ dialog, paste the script content
 and click on _Save Script_. You can also choose a custom name, icon
 ([FontAwesome](https://fontawesome.com/icons) icon classes are supported),
 color and group for the script. Quite importantly, you can also associate a
 keyboard shortcut to it, so you can quickly distill a page without having to
 search for the command either in the extension popup or in the context menu.
 ### Save to Wallabag script
 Now that we have a script to distill the current page content, we can create
 another script to save the distilled content (if available) to Wallabag.
 Otherwise, it will just save the original page content.
 The full content of the script is available
 [here](https://gist.manganiello.tech/fabio/8f5b08d8fbaa404bafc6fdeaf9b154b4).
 The structure is quite straightforward:
 - First, it checks if the page content has already been "distilled" by the
  Reader Mode script. If so, it uses the distilled content to save it to
  Wallabag. Otherwise, it will use the full page body.
 - It saves the URL to Wallabag.
 - Optionally, it sends a notification over ntfy.
 Again, feel free to assign a keybinding to this action so you can quickly call
 it from any page.
 Personally I've picked `Ctrl+Alt+1` for the Reader Mode script and `Ctrl+Alt+2`
 for the Save to Wallabag script, so I can quickly distill a page and, if takes
 me more time to read it, send the already simplified content to Wallabag.
 If you don't want to create a keybinding, you can always call these actions
 either from the extension popup or from the (right click) context menu.
 ## [_Optional_] RSS subscriptions and automated delivery
 You now have a way to manually scrape and archive articles from the Web.
 If you are also a regular reader of a publication or a blog that provides RSS
 or Atom feeds, you can also automate the process of subscribing to those feeds
 and delivering new articles to Wallabag.
 Just keep in mind two things if you want to go down this way:
 1. It's not advised to subscribe to feeds that provide a lot of articles
   every day, as this will quickly fill up your Wallabag instance and make it
   hard to find the articles you want to read. So stick to feeds that provide
   one or a few articles per day, or at least don't provide more than a dozen
   articles per day. Or augment the RSS event hook with custom filters to only
   include links that match some criteria.
 2. Unlike the manual actions we saw before, the logic to handle automated
   subscriptions and content delivery is implemented on the Platypush service
   (on the backend). So it may not be as optimal in scraping and distilling
   articles as some logic that operates on the client side and can more easily
   bypass client-side restrictions. So you may want to pick feeds that don't
   implement aggressive paywalls, are behind Cloudflare, or implement other
   client-side restrictions.
 If you have some good candidates for automated delivery, follow these steps:
 - Install the [`rss`](https://docs.platypush.tech/platypush/plugins/rss.html)
  plugin in your Platypush instance:
 ```bash
 (venv)> pip install 'platypush[rss]'
 ```
 - If you want to use the Mercury Parser API to distill articles (_optional_),
  install the dependencies for the
  [`http.webpage`](https://docs.platypush.tech/platypush/plugins/http.webpage.html).
  The Mercury API is only available in JavaScript, so you'll need to have
  `nodejs` and `npm` installed on your system. The Mercury Parser API is optional,
  but it's usually more successful than the default Wallabag scraper in distilling
  content. And, on top of that, it also makes it easier to customize your
  requests. So if you want to scrape content from paywalled websites that
  you're subscribed to you can easily pass your credentials or cookies to the
  Mercury API (Wallabag doesn't support customizing the scraping requests).
  Moreover, the Mercury integration also allows you to export the distilled
  content to other formats, such as plain text, HTML, Markdown, or PDF - this
  is useful if you want to save content to other services or applications. For
  example, I find it quite useful to scrape content from some articles in
  Markdown, and then save it to my [Nextcloud
  Notes](https://apps.nextcloud.com/apps/notes) or
  [Obsidian](https://obsidian.md).
 ```bash
 # Example for Debian/Ubuntu
 ❯ [sudo] apt install nodejs npm
 # Install Mercury Parser globally
 ❯ [sudo] npm install -g @postlight/parser
 ```
 - Add your subscriptions to the `~/.config/platypush/config.yaml` file:
 ```yaml
 rss:
  subscriptions:
    - https://example.com/feed.xml
    - https://example.com/atom.xml
 # Optional
 # http.webpage
 #   headers:
 #     # These headers will be used in all the requests made by the Mercury Parser.
 #     # You can still override the headers when you call the `http.webpage.simplify`
 #     # action though.
 #     User-Agent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"
 ```
 - Create an event hook to handle new articles from the RSS feed and
  distill them using the Mercury Parser API. You can e.g. create a
  `~/.config/platypush/scripts/subscriptions.py` file with the following
  content:
 ```python
 import logging
 import urllib.parse
 from platypush import run, when
 from platypush.events.rss import NewFeedEntryEvent
 logger = logging.getLogger(__name__)
 # Optional, set the False if you don't want to use the Mercury Parser API
 USE_MERCURY_PARSER = True
 # If there are any websites that require specific headers to be passed,
 # for example paywalled news sites that you're subscribed to and require
 # authentication, you can specify them here.
 headers_by_domain = {
  'example.com': {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.5',
    'Cookie': 'sessionid=your_cookie_value; other_cookie=other_value',
  },
 }
 def get_headers(url: str) -> dict:
    """
    Get the headers to use for the request based on the URL.
    """
    domain = re.sub(r'^www\.', '', urllib.parse.urlparse(url).netloc)
    return headers_by_domain.get(domain, {})
@when(NewFeedEntryEvent)
 def scrape_and_save(event: NewFeedEntryEvent, **_):
    """
    Scrape and save the new article to Wallabag.
    """
    content = None
    logger.info(
      'New article available on %s - title: %s, url: %s',
      event.feed_url,
      event.title,
      event.url
    )
    if USE_MERCURY_PARSER:
      # Distill the article content using the Mercury Parser API
      response = run(
        'http.webpage.simplify',
        url=url,
        format='html',
        headers=get_headers(event.url),
      )
      if not (response and response.get('content')):
        logger.warning(f'Failed to distill {url} through Mercury Parser')
      else:
        content = response['content']
    # Save the distilled content to Wallabag
    run(
      'wallabag.save',
      title=event.entry.title,
      content=content,
      url=url,
    )
    logger.info(f'Saved {url} to Wallabag')
 ```
 It is advised to run the Platypush script once _without_ the `@when` hook
 above, but with the `rss` plugin configured.
 The reason is that, on the first run, the `rss` plugin will fetch all the
 entries in the subscribed feeds and trigger the `NewFeedEntryEvent` for each
 of them. That in turn could end up with hundreds of articles pushed
 simultaneously to your Wallabag instance, you may not want that.
 The recommended flow instead (which should probably apply also any time you add
 new feeds to your subscriptions) is:
 1. Add the feeds to your `rss` plugin configuration.
 2. Restart the Platypush service and let it process all the `NewFeedEntryEvent`
   events for the existing articles.
 3. Add the event hook logic to any file under `~/.config/platypush/scripts`.
 4. Restart the service - now only new entries will trigger the events.
 ## Conclusions
 In this article we have seen how to set up a self-hosted solution to scrape and
 archive articles from the Web, and also how to automate the process through
 feed subscriptions.
 This is a powerful way to regain control over your reading experience, hopefully
 bringing it one step closer to the one you had with paper books or walks to the
 local library.
 Just remember to do so responsibly, only for personal use, and respecting the
 rights of content creators and publishers.
 It's fine to get creative and build your own reading experience by bypassing
 all the needless friction that has been added as media has moved to the digital
 space.
 But always remember to fund authors and creators in other ways, subscribe to
 those who produce high-quality content (even if you don't read content from
 their mobile app), and try to limit your scraping experience to personal use.
--- a/markdown/Read-and-archive-everything.md
+++ b/markdown/Read-and-archive-everything.md
@ -1,505 +0,0 @@
 [//]: # (title: Read and archive everything)
 [//]: # (description: Bypass client-side restrictions on news and blog articles, archive them and read them wherever you want)
 [//]: # (image: /img/twitter2mastodon.png)
 [//]: # (author: Fabio Manganiello <fabio@manganiello.tech>)
 [//]: # (published: 2025-06-04)
 I've always been an avid book reader as a kid.
 I liked the smell of the paper, the feeling of turning the pages, and the
 ability to read anywhere I wanted.
 As I grew and chose a career in tech and a digital-savvy lifestyle, I started
 to shift my consumption from the paper to the screen. But I *still* wanted the
 same feeling of a paper book, the same freedom of reading wherever I wanted.
 I was an early support of the Amazon Kindle idea, I quickly moved most of my
 physical books to the Kindle, I became a vocal supported of online magazines
 that also provided Kindle subscriptions, and I started to read more and more on
 e-ink devices.
 Then I noticed that, after an initial spike, not many magazines and blogs
 provided Kindle subscriptions or EPub versions of their articles.
 So nevermind - I started tinkering my way out of it and [wrote an article in
 2019](https://blog.platypush.tech/article/Deliver-articles-to-your-favourite-e-reader-using-Platypush)
 on how to use [Platypush](https://platypush.tech) with its [`rss`](https://docs.platypush.tech/platypush/plugins/rss.html),
 [`instapaper`](https://docs.platypush.tech/platypush/plugins/instapaper.html) and
 [`gmail`](https://docs.platypush.tech/platypush/plugins/google.mail.html)
 plugins to subscribe to RSS feeds, parse new articles, convert them to PDF and
 deliver them to my Kindle.
 Later I moved from Kindle to the first version of the
 [Mobiscribe](https://www.mobiscribe.com), as Amazon started to be more and more
 restrictive in its option to import and export stuff out of the Kindle, using
 Calibre and some DRM removal tools to export articles or books I had regularly
 purchased was becoming more cumbersome, and the Mobiscribe at that time was an
 interesting option because it offered a decent e-ink device, for a decent
 price, and it ran Android (an ancient version, but at least one that was
 sufficient to run [Instapaper](https://instapaper.com) and
 [KOReader](https://koreader.rocks)).
 That simplified things a bit because I didn't need intermediary delivery via
 email to get stuff on my Kindle or Calibre to try and pull things out of it. I
 was using Instapaper on all of my devices, included the Mobiscribe, I could
 easily scrape and push articles to it through Platypush, and I could easily
 keep track of my reading state across multiple devices.
 Good things aren't supposed to last though.
 Instapaper started to feel quite limited in its capabilities, and I didn't like
 the idea of a centralized server holding all of my saved articles. So I've
 moved to a self-hosted [Wallabag](https://wallabag.org) instance in the
 meantime - which isn't perfect, but provides a lot more customization and
 control.
 Moreover, more and more sites started implementing client-side restrictions for
 my scrapers - Instapaper was initially more affected, but slowly Wallabag too
 started bumping into Cloudflare screens, CAPTCHAs and paywalls.
 So the Internet Archive provided some temporary relief - I could still archive
 articles there, and then instruct my Wallabag instance to read them from the
 archive link.
 Except that, in the past few months, the Internet Archive has also started
 implementing anti-scraping features, and you'll most likely get a Cloudflare
 screen if you try and access an article from an external scraper.
 ## A little ethical note before continuing
 I _do not_ condone nor support piracy.
 I mean, sometimes I do, but being a creator myself I always try to make sure
 that, if piracy is the only way to freely access content wherever I want, then
 creators are not being harmed (I don't mind harming any intermediaries that add
 friction to the process and prevent me from having a raw file that I can
 download and read wherever I want though).
 So I support creators via Patreon. I pay for subscriptions to digital magazines
 that I will anyway never read through their official mobile app. I send one-off
 donations when I find that some content was particularly useful to me. I buy
 physical books and magazines every now and then from authors or publishers that
 I want to support. And I'd probably support content creators even more if only
 they allowed me to pay only for the content I want to read, and not lock me
 into a Hotel California subscription ("_you can check out any time you like,
 but you can never leave_") because their PMs only care about recurring revenue.
 I also think that the current business model that runs most of the high-quality
 content available online (locking people into apps and subscriptions in order
 to view the content) is detrimental for the distribution of knowledge in what's
 supposed to be the age of information. If I want to be exposed to diverse
 opinions on what's going on in different industries or different parts of the
 world, I probably need at least a dozen subscriptions. And probably pay
 something off to download special reports. In the earlier days we didn't have
 to give away so much money if we wanted to access content for our personal
 research - we could just buy a book or a single issue of a magazine, or even
 just walk into a library and read content for free. If we have no digital
 alternatives for such simple and established ways to access knowledge, then
 piracy becomes almost a civic duty. It can't be that high quality reports or
 insightful blog articles are locked behind paywalls, subscriptions and apps and
 all that's left for free is cheap disinformation on social media. Future
 historians will have a very hard time deciphering what was going on in the
 world in the 2020s, because most of the content that was available online is
 now locked behind paywalls, the companies that ran those sites and built the
 apps may be long gone, and if publishers keep waging war against folks like the
 Internet Archive, then they may start looking at our age like some kind of
 strange digital dark age.
 I also think that it's my right, as a reader, to be able to consume content on a medium without distractions - like
 social media buttons, ads, comments, or other stuff that distracts me from the main content, and if the publisher
 doesn't provide me with a solution for that, and I have already paid for the content, then I should be able to build a
 solution myself.
 And I also demand the right to access the content I've paid for however I want.
 Do I want to export everything to Markdown or read it in ASCII art in a
 terminal? Do I want to export it to EPub so I can read it on my e-ink device?
 Do I want to access it without having to use their tracker-ridden mobile app,
 or without being forced to see ads despite having paid for a subscription?
 Well, that's my business. I firmly believe that it's not an author's or
 publisher's right to dictate how I access the content after paying for it. Just
 like in earlier times nobody minded if, after purchasing a book, I would share
 it with my kids, or lend it to a friend, or scan it and read it on my computer,
 or make the copies of a few pages to bring to my students or my colleagues.
 If some freedoms were legally granted to me before, and now they've been taken
 away, then it's not piracy if I keep demanding those freedoms.
 And content ownership is another problem. I'll no longer be able to access
 content I've read during my subscription period once my subscription expires.
 I'll not be able to pass on the books or magazine I've read in my lifetime to
 my kid. I'll never be able to lend it to someone else, just like I would leave
 a book I had read on a public bookshelf or a bench at the park for someone
 else to read it.
 In other words, buying now grants you a temporary license to access the content
 on someone else's devices - you don't really own anything.
 So, if buying isn't owning, then piracy isn't stealing.
 And again, to make it very clear, I'll be referring to *personal usage* in this
 article. The case where you support creators through other means, but the
 distribution channel is the problem, and you just want your basic freedoms
 as a content consumer back.
 If however you start to share scraped articles on the Web, or even worse profit
 from access to it, then you're *really* doing the kind of piracy I can't
 condone.
 With this out of the way, let's get our hands dirty.
 ## The setup
 My current set up is quite complex. At some point I may package all the moving
 parts into a single stand-alone application, including both the browser
 extension and the backend, but at the moment it should be sufficient to get
 things to work.
 A high-level overview of the setup is as follows:
 <img alt="High-level overview of the scraper setup" src="http://s3.platypush.tech/static/images/wallabag-scraper-architecture.png" width="650px">
 Let's break down the building blocks of this setup:
 - **[Redirector](https://addons.mozilla.org/en-US/firefox/addon/redirector/)**
  is a browser extension that allows you to redirect URLs based on custom
  rules as soon as the page is loaded. This is useful to redirect paywalled
  resources to the Internet Archive, which usually stores full copies of the
  content. Even if you regularly paid for a subscription to a magazine, and you
  can read the article on the publisher's site or from their app, your Wallabag
  scraper will still be blocked if the site implements client-side restrictions
  or is protected by Cloudflare. So you need to redirect the URL to the Internet
  Archive, which will then return a copy of the article that you can scrape.
 - **[Platypush](https://platypush.tech)** is a Python-based general-purpose
  platform for automation that I've devoted a good chunk of the past decade
  to develop. It allows you to run actions, react to events and control devices
  and services through a unified API and Web interface, and it comes with
  [hundreds of supported integrations](https://docs.platypush.tech). We'll use
  the [`wallabag`](https://docs.platypush.tech/platypush/plugins/wallabag.html)
  plugin to push articles to your Wallabag instance, and optionally the
  [`rss`](https://docs.platypush.tech/platypush/plugins/rss.html) plugin if you
  want to programmatically subscribe to RSS feeds, scrape articles and archive
  them to Wallabag, and the
  [`ntfy`](https://docs.platypush.tech/platypush/plugins/ntfy.html) plugin to
  optionally send notifications to your mobile device when new articles are
  available.
 - **[Platypush Web extension](https://addons.mozilla.org/en-US/firefox/addon/platypush/)**
  is a browser extension that allows you to interact with Platypush from your
  browser, and it also provides a powerful JavaScript API that you can leverage
  to manipulate the DOM and automate tasks in the browser. It's like a
  [Greasemonkey](https://addons.mozilla.org/en-US/firefox/addon/greasemonkey/)
  or [Tampermonkey](https://addons.mozilla.org/en-US/firefox/addon/tampermonkey/)
  extension that allows you to write custom scripts to customize your browser
  experience, but it also allows you to interact with Platypush and leverage
  its backend capabilities. On top of that, I've also added built-in support
  for the [Mercury Parser API](https://github.com/usr42/mercury-parser) in it,
  so you can easily distill articles - similar to what Firefox does with its
  [Reader
  Mode](https://support.mozilla.org/en-US/kb/firefox-reader-view-clutter-free-web-pages),
  but in this case you can customize the layout and modify the original DOM
  directly, and the distilled content can easily be dispatched to any other
  service or application. We'll use it to:
  - Distill the article content from the page, removing all the
    unnecessary elements (ads, comments, etc.) and leaving only the main text
    and images.
  - Temporarily archive the distilled article to a Web server capable of
    serving static files, so Wallabag can get the full content and bypass any
    client-side restrictions.
  - Archive the distilled article to Wallabag, so you can read it later
    from any device that has access to your Wallabag instance.
 - **[Wallabag](https://wallabag.org)** is a self-hosted read-it-later
  service that allows you to save articles from the Web and read them later,
  even offline. It resembles the features of the ([recently
  defunct](https://support.mozilla.org/en-US/kb/future-of-pocket))
  [Pocket](https://getpocket.com/home). It provides a Web interface, mobile
  apps and browser extensions to access your saved articles, and it can also be
  used as a backend for scraping articles from the Web.
 - (_Optional_) **[KOReader](https://koreader.rocks)** is an
  open-source e-book reader that runs on a variety of devices, including any
  e-ink readers that run Android (and even the
  [Remarkable](https://github.com/koreader/koreader/wiki/Installation-on-Remarkable)).
  It has a quite minimal interface and it may take a while to get used to, but
  it's extremely powerful and customizable. I personally prefer it over the
  official Wallabag app - it has a native Wallabag integration, as well as OPDS
  integration to synchronize with my
  [Ubooquity](https://docs.linuxserver.io/images/docker-ubooquity/) server,
  synchronization of highlights and notes to Nextcloud Notes, WebDAV support
  (so you can access anything hosted on e.g. your Nextcloud instance), progress
  sync across devices through their [sync
  server](https://github.com/koreader/koreader-sync-server), and much more. It
  basically gives you a single app to access your saved articles, your books,
  your notes, your highlights, and your documents.
 - (_Optional_) An Android-based e-book reader to run KOReader on. I have
  recently switched from my old Mobiscribe to an [Onyx BOOX Note Air
  4](https://www.onyxbooxusa.com/onyx-boox-note-air4-c) and I love it. It's
  powerful, the display is great, it runs basically any Android app out there
  (and I've had no issues with running any apps installed through
  [F-Droid](https://f-droid.org)), and it also has a good set of stock apps,
  and most of them support WebDAV synchronization - ideal if you have a
  [Nextcloud](https://nextcloud.com) instance to store your documents and
  archived links.
 **NOTE**: The Platypush extension only works with Firefox, on any Firefox-based
 browser, or on any browser out there that still supports the [Manifest
 V2](https://blog.mozilla.org/addons/2024/03/13/manifest-v3-manifest-v2-march-2024-update/).
 The Manifest V3 has been a disgrace that Google has forced all browser
 extension developers to swallow. I won't go in detail here, but the Platypush
 extension needs to be able to perform actions (such as calls to custom remote
 endpoints and runtime interception of HTTP headers) that are either no longer
 supported on Manifest V3, or that are only supported through laborious
 workarounds (such as using the declarative Net Request API to explicitly
 define what you want to intercept and what remote endpoints you want to call).
 **NOTE 2**: As of June 2025, the Platypush extension is only supported on
 Firefox for desktop. A Firefox for Android version [is
 work in progress](https://git.platypush.tech/platypush/platypush-webext/issues/1).
 Let's dig deeper into the individual components of this setup.
 ## Redirector
 ![Redirector extension screenshot](https://s3.platypush.tech/static/screenshots/Redirector.png)
 This is a nice addition if you want to automatically view some links through
 the Internet Archive rather than the original site.
 You can install it from the [Firefox Add-ons site](https://addons.mozilla.org/en-US/firefox/addon/redirector/).
 Once installed, you can create a bunch of rules (regular expressions are supported)
 to redirect URLs from paywalled domains that you visit often to the Internet Archive.
 For example, this regular expression:
 ```
 ^(https://([\w-]+).substack.com/p/.*)
 ```
 will match any Substack article URL, and you can redirect it to the Internet Archive
 through this URL:
 ```
 https://archive.is/$1
 ```
 Next time you open a Substack article, it will be automatically redirected to its
 most recent archived version - or it will prompt you to archive the URL if it's not
 been archived yet.
 ## Wallabag
 ![Wallabag screenshot](https://s3.platypush.tech/static/screenshots/wallabag-1.png)
 Wallabag can easily be installed on any server [through Docker](https://doc.wallabag.org/developer/docker/).
 Follow the documentation for the set up of your user and create an API token from the Web interface.
 It's also advised to [set up a reverse
 proxy](https://doc.wallabag.org/admin/installation/virtualhosts/#configuration-on-nginx) in front of Wallabag, so you
 can easily access it over HTTPS.
 Once configured the reverse proxy, you can generate a certificate for it - for example, if you use
 [`certbot`](https://certbot.eff.org/) and `nginx`:
 ```bash
 certbot --nginx -d your-domain.com
 ```
 Then you can access your Wallabag instance at `https://your-domain.com` and log in with the user you created.
 Bonus: I personally find the Web interface of Wallabag quite ugly - the fluorescent light blue headers are distracting
 and the default font and column width isn't ideal for my taste. So I made a [Greasemonkey/Tampermonkey
 script](https://gist.manganiello.tech/fabio/ec9e28170988441d9a091b3fa6535038) to make it better if you want (see
 screenshot above).
 ## [_Optional_] ntfy
 [ntfy](https://ntfy.sh) is a simple HTTP-based pub/sub notification service that you can use to send notifications to
 your devices or your browser. It provides both an [Android app](https://f-droid.org/en/packages/io.heckel.ntfy/) and a
 [browser addon](https://addons.mozilla.org/en-US/firefox/addon/send-to-ntfy/) to send and receive notifications,
 allowing you to open saved links directly on your phone or any other device subscribed to the same topic.
 Running it via docker-compose [is quite
 straightforward](https://github.com/binwiederhier/ntfy/blob/main/docker-compose.yml).
 It's also advised to serve it behind a reverse proxy with HTTPS support, keeping in mind to set the right header for the
 Websocket paths - example nginx configuration:
 ```nginx
 map $http_upgrade $connection_upgrade {
    default upgrade;
    '' close;
 }
 server {
  server_name notify.example.com;
  location / {
      proxy_pass http://your-internal-ntfy-host:port;
      client_max_body_size 5M;
      proxy_read_timeout 60;
      proxy_connect_timeout 60;
      proxy_redirect off;
      proxy_set_header Host $http_host;
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header X-Forwarded-Ssl on;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  }
  location ~ .*/ws/?$ {
      proxy_http_version 1.1;
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection $connection_upgrade;
      proxy_set_header Host $http_host;
      proxy_pass http://your-internal-ntfy-host:port;
  }
 }
 ```
 Once the server is running, you can check the connectivity by opening your server's main page in your browser.
 ## Local Web server
 This approach uses an intermediary Web server to temporarily archive the distilled article content, if available, and
 instructing Wallabag to parse it from there.
 ## Platypush
 Create a new virtual environment and install Platypush with the `wallabag` and `rss`
 plugin dependencies through `pip`:
 ```bash
 python3 -m venv venv
 source venv/bin/activate
 pip install 'platypush[wallabag,rss]'
 ```
 Then create a new configuration file `~/.config/platypush/config.yaml` with the following configuration:
 ```yaml
 # Web server configuration
 backend.http:
  # - port: 8008
 # Wallabag configuration
 wallabag:
  server_url: https://your-domain.com
  client_id: your_client_id
  client_secret: your_client_secret
  # Your Wallabag user credentials are required for the first login.
  # It's also advised to keep them here afterwards so the refresh
  # token can be automatically updated.
  username: your_username
  password: your_password
 ```
 Then you can start the service with:
 ```bash
 platypush
 ```
 You can also create a systemd service to run Platypush in the background:
 ```bash
 mkdir -p ~/.config/systemd/user
 cat <<EOF > ~/.config/systemd/user/platypush.service
 [Unit]
 Description=Platypush service
 After=network.target
 [Service]
 ExecStart=/path/to/venv/bin/platypush
 Restart=always
 RestartSec=5
 EOF
 systemctl --user daemon-reload
 systemctl --user enable --now platypush.service
 ```
 After starting the service, head over to `http://your_platypush_host:8008` (or the port you configured in the
 `backend.http` section) and create a new user account.
 It's also advised to serve the Platypush Web server behind a reverse proxy with HTTPS support if you want it to easily
 be accessible from the browser extension - a basic `nginx` configuration [is available on the
 repo](https://git.platypush.tech/platypush/platypush/src/branch/master/examples/nginx/nginx.sample.conf).
 ## Platypush Web extension
 You can install the Platypush Web extension from the [Firefox Add-ons
 site](https://addons.mozilla.org/en-US/firefox/addon/platypush/).
 After installing it, click on the extension popup and add the URL of your Platypush Web server.
 ![Platypush Web extension - authentication
 screenshot](https://s3.platypush.tech/static/screenshots/webext/add-device.png)
 When successfully connected, you should see the device in the main menu, you can run commands on it and save actions.
 A good place to start familiarizing with the Platypush API is the _Run Action_ dialog, which allows you to run commands
 on your server and provides autocomplete for the available actions, as well as documentation about their arguments.
 ![Platypush Web extension - run dialog](https://s3.platypush.tech/static/screenshots/webext/run-2.png)
 The default action mode is _Request_ (i.e. single requests against the API). You can also pack together more actions on
 the backend [into
 _procedures_](https://docs.platypush.tech/wiki/Quickstart.html#greet-me-with-lights-and-music-when-i-come-home), which
 can be written either in the YAML config or as Python scripts (by default loaded from `~/.config/platypush/scripts`).
 If correctly configured, procedures will be available in the _Run Action_ dialog.
 The other mode, which we'll use in this article, is _Script_. In this mode you can write custom JavaScript code that
 can interact with your browser.
 ![Platypush Web extension - script mode](https://platypush-static.s3.nl-ams.scw.cloud/screenshots/webext/script.png)
 [Here](https://gist.github.com/BlackLight/d80c571705215924abc06a80994fd5f4) is a sample script that you can use as a
 reference for the API exposed by the extension. Some examples include:
 - `app.run`, to run an action on the Platypush backend
 - `app.getURL`, `app.setURL` and `app.openTab` to get and set the current URL, or open a new tab with a given URL
 - `app.axios.get`, `app.axios.post` etc. to perform HTTP requests to other external services through the Axios
  library
 - `app.getDOM` and `app.setDOM` to get and set the current page DOM
 - `app.mercury.parse` to distill the current page content using the Mercury Parser API
 ### Reader mode script
 We can put together the building blocks above to create our first script, which will distill the current page content
 and replace the swap the current page DOM with the simplified content - with no ads, comments, or other distracting
 visual elements. The full content of the script is available
 [here](https://gist.manganiello.tech/fabio/c731b57ff6b24d21a8f43fbedde3dc30).
 This is akin to what Firefox' [Reader
 Mode](https://support.mozilla.org/en-US/kb/firefox-reader-view-clutter-free-web-pages) does, but with much more room for
 customization.
 Note that for this specific script we don't need any interactions with the Platypush backend. Everything happens on the
 client, as the Mercury API is built into the Platypush Web extension.
 Switch to _Script_ mode in the _Run Action_ dialog, paste the script content and click on _Save Script_. You can also
 choose a custom name, icon ([FontAwesome](https://fontawesome.com/icons) icon classes are supported), color and group
 for the script. Quite importantly, you can also associate a keyboard shortcut to it, so you can quickly distill a page
 without having to search for the command either in the extension popup or in the context menu.
 ### Save to Wallabag script
 Now that we have a script to distill the current page content, we can create another script to save the distilled
 content (if available) to Wallabag. Otherwise, it will just save the original page content.
 The full content of the script is available
 [here](https://gist.manganiello.tech/fabio/8f5b08d8fbaa404bafc6fdeaf9b154b4).