blog/markdown/Build-your-self-hosted-Ever...

1129 lines
51 KiB
Markdown
Raw Normal View History

[//]: # (title: Build your self-hosted Evernote)
2022-01-06 19:55:11 +01:00
[//]: # (description: How to use Platypush and other open-source tools to build a notebook synchronized across multiple devices)
[//]: # (image: /img/notebook.jpg)
[//]: # (author: Fabio Manganiello <fabio@platypush.tech>)
2022-01-06 19:55:11 +01:00
[//]: # (published: 2022-01-06)
## The need for an online _second brain_
When [Evernote](https://evernote.com) launched the idea of an online notebook as a sort of "second brain"
more than a decade ago, it resonated so much with what I had been trying to achieve for a while.
By then I already had tons of bookmarks, text files with read-it-later links, notes I had taken across multiple devices,
sketches I had taken on physical paper and drafts of articles or papers I was working on. All of this content used
to be sparse across many devices, it was painful to sync, and then Evernote came like water in a desert.
I have been a happy Evernote user until ~5-6 years ago, when I realized that the company had run out of
ideas, and I could no longer compromise with its decisions. If Evernote was supposed to be my second brain
then it should have been very simple to synchronize it with my filesystem and across multiple devices, but
that wasn't as simple as it sounds. Evernote had a primitive API, a primitive web clipper, no Linux client,
and, as it tried harder and harder to monetize its product, it put more and more features behind expensive tiers.
Moreover, Evernote experienced [data losses](https://www.cnet.com/news/thousands-of-evernote-users-affected-by-data-loss/),
[security breaches](https://thenextweb.com/insider/2013/03/05/after-major-data-breach-evernote-accelerates-plans-to-implement-two-factor-authentication/)
and [privacy controversies](https://www.forbes.com/sites/thomasbrewster/2016/12/14/worst-privacy-policy-evernote/#525cc6c71977)
that in my eyes made it unfit to handle something as precious as the notes from my life and my work.
I could not compromise with a product that would charge me $5 more a month just to have it running on an additional
device, especially when the product itself didn't look that solid to me. If Evernote was supposed to be my second brain
then I should have been able to take it with me wherever I wanted, without having to worry on how many devices I was
using it already, without having to fear future changes or more aggressive monetization policies that could have limited
my ability to use the product.
So I started my journey as a wanderer of note-taking and link-saving services. Yes, ideally I want something that can
do both: your digital brain consists both of the notes you've taken and the links you've saved.
I've tried many of them over the following years (Instapaper, Pocket, Readability, Mercury Reader, SpringPad, Google
Keep, OneNote, Dropbox Paper...), but eventually got dissatisfied by most of them:
1. In most of the cases those products fall into the note-taking category or web scraper/saver category, rarely both.
2. In most of the cases you have to pay a monthly/yearly fee for something as simple as storing and syncing text.
3. Many of the products above either lack an API to programmatically import/export/read data, or they put their APIs
behind some premium tiers. This is a no-go for me: if the company that builds the product goes down, the last thing
I want is my personal notes, links and bookmarks to go down with it with no easy way to get them out.
4. Most of those products don't have local filesystem sync features: everything only works in their app.
My dissatisfaction with the products on the market was a bit relieved when I discovered [Obsidian](https://obsidian.md/).
A Markdown-based, modern-looking, multi-device product that transparently stores your notes on your own local storage,
and it even provides plenty of community plugins? That covers all I want, it's almost too good to be true! And, indeed,
it is too good to be true. Obsidian [charges](https://obsidian.md/pricing) $8 a month just for syncing content across
devices (copying content to their own cloud), and $16 a month if you want to publish/share your content. Those are
unacceptably high prices for something as simple as synchronizing and sharing text files! This was the trigger that
motivated me to take the matter into my own hands, so I came up with the wishlist for my ideal "second brain" app:
1. It needs to be self-hosted. No cloud services involved: it's easy to put stuff on somebody else's cloud, it's
usually much harder to take it out, and cloud services are unreliable by definition - they may decide from a moment
to another that they aren't making enough money, charge more for some features you are using, while keeping your own
most precious data as hostage. Or, worse, they could go down and take all of your data with them.
2. Each device should have a local copy of my notebook, and it should be simple to synchronize changes across these
copies.
3. It ought to be Markdown-based. Markdown is portable, clean, easy to index and search, it can easily be converted to
HTML if required, but it's much less cumbersome to read and write, and it's easy to import/export. To give an idea of
the underestimated power and flexibility of Markdown, keep in mind that all the articles on
[the Platypush blog](https://blog.platypush.tech)
are static Markdown files on a local server that are converted on the fly to HTML before being served to your browser.
4. It needs to be able to handle my own notes, as well as parse and convert to Markdown web pages that I'd like to
save or read later.
5. It must be easy to add and modify content. Whether I want to add a new link from my browser session on my laptop,
phone or tablet, or type some text on the fly from my phone, or resume working on a draft from another device, I
should be able to do so with no friction, as if I were working always on the same device.
6. It needs to work offline. I want to be able to work on a blog article while I'm on a flight with no Internet
connection, and I expect the content to be automatically synced as soon as my device gets a connection.
7. It needs to be file-based. I'm sick of custom formats, arcane APIs and other barriers and pointless abstractions
between me and my text. The KISS rule applies here: if it's a text file, and it appears on my machine inside a
normal directory, then expose it as a text file, and you'll get primitives such as
read/create/modify/copy/move/delete for free.
8. It needs to encapsulate some good web scraping/parsing logic, so every web page can be distilled into a readable and
easily exportable Markdown format.
9. It needs to allow automated routines - for instance, automatically fetch new content from an RSS feed and download
it in readable format on the shared repository.
It looks like a long shopping list, but it actually doesn't take that much to implement it. It's time to get to the
whiteboard and design its architecture.
## High-level architecture
From a high-level perspective, the architecture we are trying to build resembles something like this:
![High-level architecture](../img/self-hosted-notebook-architecture.svg)
## The git repository
We basically use a git server as the repository for our notes and links. It could be a private repo on GitHub or Gitlab,
or even a static folder initialized as a git repo on a server accessible over SSH. There are many advantages in choosing
a versioning system like git as the source of truth for your notebook content:
1. _History tracking_ comes for free: it's easy to keep track of changes commit by different devices, as well as rollback
to previous versions - nothing is ever really lost.
2. _Easy synchronization_: pushing new content to your notes can be mapped to a `git push`, synchronizing new content on
other devices can be mapped to a `git pull`.
3. _Native Markdown-friendly interfaces_: both GitHub and Gitlab provide native good interfaces to visualize Markdown
content. Browsing and managing your notebook is as easy as browsing a git repo.
4. _Easy to import and export_: exporting your notebook to another device is as simple as running a `git clone`.
5. _Storage flexibility_: you can create the repo on a cloud instance, on a self-hosted instance, or on any machine
with an SSH interface. The repo can live anywhere, as long as it is accessible to the devices that you want to use.
So the first requirement for this project is to set up a git repository on whatever source you want to use a central
storage for your notebook. We have mainly three options for this:
#### Create a new repo on a GitHub/Gitlab cloud instance.
1. _Pros_: you don't have to maintain a git server, you just have to create a new project, and you have all the fancy
interfaces for managing files and viewing Markdown content.
2. _Cons_: it's not really 100% self-hosted, isn't it? :)
#### Host a Gitlab instance yourself.
1. _Pros_: plenty of flexibility when it comes to hosting. You can even run the server on a machine only accessible
from the outside over a VPN, which brings some nice security features and content encapsulation. Plus, you have a
modern interface like Gitlab to handle your files, and you can also easily set up repository automation through
web hooks.
2. _Cons_: installing and running a Gitlab instance is a process with its own learning curve. Plus, a Gitlab instance
is usually quite resource-hungry - don't run it on a Raspberry Pi if you want the user experience to be smooth.
#### Initialize an empty repository on any publicly accessible server (or accessible over VPN) with an SSH interface.
An often forgotten feature of git is that it's basically a wrapper on top of SSH, therefore you can create a repo on
the fly on any machine that runs an SSH server - no need for a full-blown web framework on top of it. It's as simple
as:
```bash
# Server machine
$ mkdir -p /home/user/notebook.git
$ cd /home/user/notebook.git
$ git init --bare
# Client machine
$ git clone user@remote-machine:/home/user/notebook.git
```
1. _Pros_: the most flexible option: you can run your notebook storage on literally anything that has a CPU, an SSH
interface and git.
2. _Cons_: you won't have a fancy native interface to manage your files, nor repository automation features such as
actions or web hooks (available with GitHub and Gitlab respectively).
## The Markdown web server
It may be handy to have a web server to access your notes and links from any browser, especially if your repository
doesn't live on GitHub/Gitlab, and therefore it doesn't have a native way to expose the files over the web.
Clone the notebook repo on the machine where you want to expose the Markdown web server and then install
[Madness](https://github.com/DannyBen/madness) and its dependencies:
```bash
$ sudo apt install ruby-full
$ gem install madness
```
Take note of where the `madness` executable was installed and create a new user systemd service file under
`~/.config/systemd/user/madness.service` to manage the server on your repo folder:
```ini
[Unit]
Description=Serve Markdown content over HTML
After=network.target
[Service]
ExecStart=/home/user/.gem/ruby/version/bin/madness /path/to/the/notebook --port 9999
Restart=always
RestartSec=10
[Install]
WantedBy=default.target
```
Reload the systemd daemon and start/enable the server:
```bash
$ systemctl --user daemon-reload
$ systemctl --user start madness
$ systemctl --user enable madness
```
If everything went well you can head your browser to `http://host:9999` and you should see the Madness interface with
your Markdown files.
![Madness interface screenshot](../img/madness-screenshot.png)
You can easily configure a [nginx reverse proxy](https://docs.nginx.com/nginx/admin-guide/web-server/reverse-proxy/)
or an [SSH tunnel](https://www.ssh.com/academy/ssh/tunneling) to expose the server outside of the local network.
## The MQTT broker
An MQTT broker is another crucial ingredient in this set up. It is used to asynchronously transmit events such as a
request to add a new URL or update the local repository copies.
Any of the open-source MQTT brokers out there should do the job. I personally use [Mosquitto](https://mosquitto.org/)
for most of my projects, but [RabbitMQ](https://www.rabbitmq.com/), [Aedes](https://github.com/moscajs/aedes) or any
other broker should all just work.
Just like the git server, you should also install the MQTT on a machine that is either publicly accessible, or it is
accessible over VPN by all the devices you want to use your notebook on. If you opt for a machine with a publicly
accessible IP address then it's advised to enable both SSL and username/password authentication on your broker, so
unauthorized parties won't be able to connect to it.
Taking the case of Mosquitto, the installation and configuration is pretty straightforward. Install the `mosquitto`
package from your favourite package manager, the installation process should also create a configuration file under
`/etc/mosquitto/mosquitto.conf`. In the case of an SSL configuration with username and password, you would usually
configure the following options:
```ini
# Usually 1883 for non-SSL connections, 8883 for SSL connections
port 8883
# SSL/TLS version
tls_version tlsv1.2
# Path to the certificate chain
cafile /etc/mosquitto/certs/chain.crt
# Path to the server certificate
certfile /etc/mosquitto/certs/server.crt
# Path to the server private key
keyfile /etc/mosquitto/certs/server.key
# Set to false to disable access without username and password
allow_anonymous false
# Password file, which contains username:password pairs
# You can create and manage a password file by following the
# instructions reported here:
# https://mosquitto.org/documentation/authentication-methods/
password_file /etc/mosquitto/passwords.txt
```
If you don't need SSL encryption and authentication on your broker (which is ok if you are running the broker on a
private network and accessing it from the outside over VPN) then you'll only need to set the `port` option.
After you have configured the MQTT broker, you can start it and enable it via `systemd`:
```bash
$ sudo systemctl start mosquitto
$ sudo systemctl enable mosquitto
```
You can then use an MQTT client like [MQTT Explorer](http://mqtt-explorer.com/) to connect to the broker and verify
that everything is working.
## The Platypush automation
Once the git repo and the MQTT broker are in place, it's time to set up Platypush on one of the machines where you want
to keep your notebook synchronized - e.g. your laptop.
In this context, Platypush is used to glue together the pieces of the sync automation by defining the following chains
of events:
1. When a file system change is detected in the folder where the notebook is cloned (for example because a note was
added, removed or edited), start a timer than within e.g. 30 seconds synchronizes the changes to the git repository
(the timer is used to throttle the frequency of update events). Then send a message to the MQTT `notebook/sync` topic
to tell the other clients that they should synchronize their copies of the repository.
2. When a client receives a message on `notebook/sync`, and the originator is different from the client itself (this is
necessary in order to prevent "sync loops"), pull the latest changes from the remote repository.
3. When a specific client (which will be in charge of scraping URLs and adding new remote content) receives a message on
the MQTT `notebook/save` topic with a URL attached, the content of the associated web page will be parsed and saved
to the notebook ("Save URL" feature).
The same automation logic can be set up on as many clients as you like.
The first step is to install the Redis server and Platypush on your client machine. For example, on a Debian-based
system:
```bash
# Install Redis
$ sudo apt install redis-server
# Start and enable the Redis server
$ sudo systemctl start redis-server
$ sudo systemctl enable redis-server
# Install Platypush
$ sudo pip install platypush
```
You'll then have to create a configuration file to tell Platypush which services you want to use. Our use-case will
require the following integrations:
- `mqtt` ([backend](https://docs.platypush.tech/platypush/backend/mqtt.html) and
[plugin](https://docs.platypush.tech/platypush/plugins/mqtt.html)), used to subscribe to sync/save topics and dispatch
messages to the broker.
- [`file.monitor` backend](https://docs.platypush.tech/platypush/backend/file.monitor.html), used to monitor changes to
local folders.
- [Optional] [`pushbullet`](https://docs.platypush.tech/platypush/plugins/pushbullet.html), or an alternative way to
deliver notifications to other devices (such as
[`telegram`](https://docs.platypush.tech/platypush/plugins/chat.telegram.html),
[`twilio`](https://docs.platypush.tech/platypush/plugins/twilio.html),
[`gotify`](https://docs.platypush.tech/platypush/plugins/gotify.html),
[`mailgun`](https://docs.platypush.tech/platypush/plugins/mailgun.html)). We'll use this to notify other clients when
new content has been added.
- [Optional] the [`http.webpage`](https://docs.platypush.tech/platypush/plugins/http.webpage.html) integration,
used to scrape a web page's content to Markdown or PDF.
Start by creating a `config.yaml` file with your integrations:
```yaml
# The name of your client
device_id: my-client
mqtt:
host: your-mqtt-server
port: 1883
# Uncomment the lines below for SSL/user+password authentication
# port: 8883
# username: user
# password: pass
# tls_cafile: ~/path/to/ssl.crt
# tls_version: tlsv1.2
# Specify the topics you want to subscribe here
backend.mqtt:
listeners:
- topics:
- notebook/sync
# The configuration for the file monitor follows.
# This logic triggers FileSystemEvents whenever a change
# happens on the specified folder. We can use these events
# to build our sync logic
backend.file.monitor:
paths:
# Path to the folder where you have cloned the notebook
# git repo on your client
- path: /path/to/the/notebook
recursive: true
# Ignore changes on non-content sub-folders, such as .git or
# other configuration/cache folders
ignore_directories:
- .git
- .obsidian
```
Then generate a new Platypush virtual environment from the configuration file:
```bash
$ platyvenv build -c config.yaml
```
Once the command has run, it should report a line like the following:
```
Platypush virtual environment prepared under /home/user/.local/share/platypush/venv/my-client
```
Let's call this path `$PREFIX`. Create a structure to store your scripts under `$PREFIX/etc/platypush` (a copy of the
`config.yaml` file should already be there at this point). The structure will look like this:
```conf
$PREFIX
-> etc
-> platypush
-> config.yaml # Configuration file
-> scripts # Scripts folder
-> __init__.py # Empty file
-> notebook.py # Logic for notebook synchronization
```
Let's proceed with defining the core logic in `notebook.py`:
```python
import logging
import os
import re
from threading import RLock, Timer
from platypush.config import Config
from platypush.event.hook import hook
from platypush.message.event.file import FileSystemEvent
from platypush.message.event.mqtt import MQTTMessageEvent
from platypush.procedure import procedure
from platypush.utils import run
logger = logging.getLogger('notebook')
repo_path = '/path/to/your/git/repo'
sync_timer = None
sync_timer_lock = RLock()
def should_sync_notebook(event: MQTTMessageEvent) -> bool:
"""
Only synchronize the notebook if a sync request came from
a source other than ourselves - this is required to prevent
"sync loops", where a client receives its own sync message
and broadcasts sync requests again and again.
"""
return Config.get('device_id') != event.msg.get('origin')
def cancel_sync_timer():
"""
Utility function to cancel a pending synchronization timer.
"""
global sync_timer
with sync_timer_lock:
if sync_timer:
sync_timer.cancel()
sync_timer = None
def reset_sync_timer(path: str, seconds=15):
"""
Utility function to start a synchronization timer.
"""
global sync_timer
with sync_timer_lock:
cancel_sync_timer()
sync_timer = Timer(seconds, sync_notebook, (path,))
sync_timer.start()
@hook(MQTTMessageEvent, topic='notebook/sync')
def on_notebook_remote_update(event, **_):
"""
This hook is triggered when a message is received on the
notebook/sync MQTT topic. It triggers a sync between the
local and remote copies of the repository.
"""
if not should_sync_notebook(event):
return
sync_notebook(repo_path)
@hook(FileSystemEvent)
def on_notebook_local_update(event, **_):
"""
This hook is triggered when a change (i.e. file/directory
create/update/delete) is performed on the folder where the
repository is cloned. It starts a timer to synchronize the
local and remote repository copies.
"""
if not event.path.startswith(repo_path):
return
logger.info(f'Synchronizing repo path {repo_path}')
reset_sync_timer(repo_path)
@procedure
def sync_notebook(path: str, **_):
"""
This function holds the main synchronization logic.
It is declared through the @procedure decorator, so you can also
programmatically call it from your requests through e.g.
`procedure.notebook.sync_notebook`.
"""
# The timer lock ensures that only one thread at the time can
# synchronize the notebook
with sync_timer_lock:
# Cancel any previously awaiting timer
cancel_sync_timer()
logger.info(f'Synchronizing notebook - path: {path}')
cwd = os.getcwd()
os.chdir(path)
has_stashed_changes = False
try:
# Check if the local copy of the repo has changes
git_status = run('shell.exec', 'git status --porcelain').strip()
if git_status:
logger.info('The local copy has changes: synchronizing them to the repo')
# If we have modified/deleted files then we stash the local changes
# before pulling the remote changes to prevent conflicts
has_modifications = any(re.match(r'^\s*[MD]\s+', line) for line in git_status.split('\n'))
if has_modifications:
logger.info(run('shell.exec', 'git stash', ignore_errors=True))
has_stashed_changes = True
# Pull the latest changes from the repo
logger.info(run('shell.exec', 'git pull --rebase'))
if has_modifications:
# Un-stash the local changes
logger.info(run('shell.exec', 'git stash pop'))
# Add, commit and push the local changes
has_stashed_changes = False
device_id = Config.get('device_id')
logger.info(run('shell.exec', 'git add .'))
logger.info(run('shell.exec', f'git commit -a -m "Automatic sync triggered by {device_id}"'))
logger.info(run('shell.exec', 'git push origin main'))
# Notify other clients by pushing a message to the notebook/sync topic
# having this client ID as the origin. As an alternative, if you are using
# Gitlab to host your repo, you can also configure a webhook that is called
# upon push events and sends the same message to notebook/sync.
run('mqtt.publish', topic='notebook/sync', msg={'origin': Config.get('device_id')})
else:
# If we have no local changes, just pull the remote changes
logger.info(run('shell.exec', 'git pull'))
except Exception as e:
if has_stashed_changes:
logger.info(run('shell.exec', 'git stash pop'))
# In case of errors, retry in 5 minutes
reset_sync_timer(path, seconds=300)
raise e
finally:
os.chdir(cwd)
logger.info('Notebook synchronized')
```
Now you can start the newly configured environment:
```bash
$ platyvenv start my-client
```
Or create a systemd user service for it under `~/.config/systemd/user/platypush-notebook.service`:
```bash
$ cat <<EOF > ~/.config/systemd/user/platypush-notebook.service
[Unit]
Description=Platypush notebook automation
After=network.target
[Service]
ExecStart=/path/to/platyvenv start my-client
ExecStop=/path/to/platyvenv stop my-client
Restart=always
RestartSec=10
[Install]
WantedBy=default.target
EOF
$ systemctl --user daemon-reload
$ systemctl --user start platypush-notebook
$ systemctl --user enable platypush-notebook
```
While the service is running, try and create a new Markdown file under the monitored repository local copy. Within a
few seconds the automation should be triggered and the new file should be automatically pushed to the repo. If you are
running the code on multiple hosts, then those should also fetch the updates within seconds. You can also run an
instance on the same server that runs Madness to synchronize its copy of the repo, and your web instance will remain in
sync with any updates. Congratulations, you have set up a distributed network to synchronize your notes!
## Android setup
You may probably want a way to access your notebook also on your phone and tablet, and keep the copy on your mobile
devices automatically in sync with the server.
Luckily, it is possible to install and run Platypush on Android through [`Termux`](https://termux.com/), and the logic
you have set up on your laptops and servers should also work flawlessly on Android. Termux allows you to run a
Linux environment in user mode with no need for rooting your device.
First, install the [`Termux` app](https://f-droid.org/packages/com.termux/) on your Android device. Optionally, you may
also want to install the following apps:
- [`Termux:API`](https://f-droid.org/en/packages/com.termux.api/): to programmatically access Android features (e.g.
SMS texts, camera, GPS, battery level etc.) from your scripts.
- [`Termux:Boot`](https://f-droid.org/en/packages/com.termux.boot/): to start services such as Redis and Platypush at
boot time without having to open the Termux app first (advised).
- [`Termux:Widget`](https://f-droid.org/en/packages/com.termux.widget/): to add scripts (for example to manually start
Platypush or synchronize the notebook) on the home screen.
- [`Termux:GUI`](https://f-droid.org/en/packages/com.termux.gui/): to add support for visual elements (such as dialogs
and widgets for sharing content) to your scripts.
After installing Termux, open a new session, update the packages, install `termux-services` (for services support) and
enable SSH access (it's usually more handy to type commands on a physical keyboard than a phone screen):
```bash
$ pkg update
$ pkg install termux-services openssh
# Start and enable the SSH service
$ sv up sshd
$ sv-enable sshd
# Set a user password
$ passwd
```
A service that is enabled through `sv-enable` will be started when a Termux session is first opened, but not at boot
time unless Termux is started. If you want a service to be started a boot time, you need to install the `Termux:Boot`
app and then place the scripts you want to run at boot time inside the `~/.termux/boot` folder.
After starting `sshd` and setting a password, you should be able to log in to your Android device over SSH:
```bash
$ ssh -p 8022 anyuser@android-device
```
The next step is to enable access for Termux to the internal storage (by default it can only access the app's own data
folder). This can easily be done by running `termux-setup-storage` and allowing storage access on the prompt. We may
also want to disable battery optimization for Termux, so the services won't be killed in case of inactivity.
Then install git, Redis, Platypush and its Python dependencies, and start/enable the Redis server:
```bash
$ pkg install git redis python3
$ pip install platypush
```
If running the `redis-server` command results in an error, then you may need to explicitly disable a warning for a COW
bug for ARM64 architectures in the Redis configuration file. Simply add or uncomment the following line in
`/data/data/com.termux/files/usr/etc/redis.conf`:
```
ignore-warnings ARM64-COW-BUG
```
We then need to create a service for Redis, since it's not available by default. Termux doesn't use systemd to manage
services, since that would require access to the PID 1, which is only available to the root user. Instead, it uses it
own system of scripts that goes under the name of [_Termux services_](https://wiki.termux.com/wiki/Termux-services).
Services are installed under `/data/data/com.termux/files/usr/var/service`. Just `cd` to that directory and copy the
available `sshd` service to `redis`:
```bash
$ cd /data/data/com.termux/files/usr/var/service
$ cp -r sshd redis
```
Then replace the content of the `run` file in the service directory with this:
```bash
#!/data/data/com.termux/files/usr/bin/sh
exec redis-server 2>&1
```
Then restart Termux so that it refreshes its list of services, and start/enable the Redis service (or create a boot
script for it):
```bash
$ sv up redis
$ sv-enable redis
```
Verify that you can access the `/sdcard` folder (shared storage) after restarting Termux. If that's the case, we can
now clone the notebook repo under `/sdcard/notebook`:
```bash
$ git clone git-url /sdcard/notebook
```
The steps for installing and configuring the Platypush automation are the same shown in the previous section, with the
following exceptions:
- `repo_path` in the `notebook.py` script needs to point to `/sdcard/notebook` - if the notebook is cloned on the user's
home directory then other apps won't be able to access it.
- If you want to run it in a service, you'll have to follow the same steps illustrated for Redis instead of creating
a systemd service.
You may also want to redirect the Platypush stdout/stderr to a log file, since Termux messages don't have the same
sophisticated level of logging provided by systemd. The startup command should therefore look like:
```bash
platyvenv start my-client > /path/to/logs/platypush.log 2>&1
```
Once everything is configured and you restart Termux, Platypush should automatically start in the background - you can
check the status by running a `tail` on the log file or through the `ps` command. If you change a file in your notebook
on either your Android device or your laptop, everything should now get up to date within a minute.
Finally, we can also leverage `Termux:Shortcuts` to add a widget to the home screen to manually trigger the sync
process - maybe because an update was received while the phone was off or the Platypush service was not running.
Create a `~/.shortcuts` folder with a script inside named e.g. `sync_notebook.sh`:
```bash
#!/data/data/com.termux/files/usr/bin/bash
cat <<EOF | python
from platypush.utils import run
run('mqtt.publish', topic='notebook/sync', msg={'origin': None})
EOF
```
This script leverages the `platypush.utils.run` method to send a message to the `notebook/sync` MQTT topic with no
`origin` to force all the subscribed clients to pull the latest updates from the remote server.
You can now browse to the widgets' menu of your Android device (usually it's done by long-pressing an empty area on the
launcher), select _Termux shortcut_ and then select your newly created script. By clicking on the icon you will force a
sync across all the connected devices.
Once Termux is properly configured, you don't need to repeat the whole procedure on other Android devices. Simply use
the [Termux backup](https://wiki.termux.com/wiki/Backing_up_Termux) scripts to back up your whole configuration and
copy it/restore it on another device, and you'll have the whole synchronization logic up and running.
## The Obsidian app
Now that the backend synchronization logic is in place, it's time to move to the frontend side. As mentioned earlier,
Obsidian is an option I really like - it has a modern interface, it's cross-platform, it's
[electronjs-based](https://www.electronjs.org/), it has many plugins, it relies on simple Markdown, and it just needs
a local folder to work. As mentioned earlier, you would normally need to subscribe to Obsidian Sync in order to
synchronize notes across devices, but now you've got a self-synchronizing git repo copy on any device you like. So just
install Obsidian on your computer or mobile, point it to the local copy of the git notebook, and you're set to go!
![Obsidian screenshot](../img/obsidian-screenshot.png)
## The NextCloud option
Another nice option is to synchronize your notebook across multiple devices is to use a
[NextCloud](https://nextcloud.com/) instance. NextCloud provides a [Notes app](https://apps.nextcloud.com/apps/notes)
that already supports notes in Markdown format, and it also comes with an
[Android app](https://f-droid.org/en/packages/it.niedermann.owncloud.notes/).
If that's the way you want to go, you can still have notes<->git synchronization by simply setting up the Platypush
notebook automation on the server where NextCloud is running. Just clone the repository to your NextCloud Notes folder:
```bash
$ git clone git-url /path/to/nextcloud/data/user/files/Notes
```
And then set the `repo_path` in `notebook.py` to this directory.
Keep in mind however that local changes in the `Notes` folder will not be synchronized to the NextCloud app until the
next cron is executed. If you want the changes to be propagated as soon as they are pushed to the git repo, then you'll
have to add an extra piece of logic to the script that synchronizes the notebook, in order to rescan the `Notes` folder
for changes. Also, Platypush will have to run with the same user that runs the NextCloud web server, because of the
requirements for executing the `occ` script:
```python
import logging
from platypush.utils import run
...
logger = logging.getLogger('notebook')
# Path to the NextCloud occ script
occ_path = '/srv/http/nextcloud/occ'
...
def sync_notebook(path: str, **_):
...
refresh_nextcloud()
def refresh_nextcloud():
logger.info(run('shell.exec', f'php {occ_path} files:scan --path=/nextcloud-user/files/Notes'))
logger.info(run('shell.exec', f'php {occ_path} files:cleanup'))
```
Your notebook is now synchronized with NextCloud, and it can be accessed from any NextCloud client!
## Automation to parse and save web pages
Now that we have a way to keep our notes synchronized across multiple devices and interfaces, let's explore how we can
parse web pages and save them in our notebook in Markdown format - we may want to read them later on another device,
read the content without all the clutter, or just keep a persistent track of the articles that we have read.
Elect a notebook client to be in charge of scraping and saving URLs. This client will have a configuration like this:
```yaml
# The name of your client
device_id: my-client
mqtt:
host: your-mqtt-server
port: 1883
# Uncomment the lines below for SSL/user+password authentication
# port: 8883
# username: user
# password: pass
# tls_cafile: ~/path/to/ssl.crt
# tls_version: tlsv1.2
# Specify the topics you want to subscribe here
backend.mqtt:
listeners:
- topics:
- notebook/sync
# notebook/save will be used to send parsing requests
- notebook/save
# Monitor the local repository copy for changes
backend.file.monitor:
paths:
# Path to the folder where you have cloned the notebook
# git repo on your client
- path: /path/to/the/notebook
recursive: true
# Ignore changes on non-content sub-folders, such as .git or
# other configuration/cache folders
ignore_directories:
- .git
- .obsidian
# Enable the http.webpage integration for parsing web pages
http.webpage:
enabled: true
# We will use Pushbullet to send a link to all the connected devices
# with the URL of the newly saved link, but you can use any other
# services for delivering notifications and/or messages - such as
# Gotify, Twilio, Telegram or any email integration
backend.pushbullet:
token: my-token
device: my-client
pushbullet:
enabled: true
```
Build an environment from this configuration file:
```bash
$ platyvenv build -c config.yaml
```
Make sure that at the end of the process you have the `node` and `npm` executables installed - the `http.webpage`
integration uses the [Mercury Parser](https://github.com/postlight/mercury-parser) API to convert web pages to Markdown.
Then copy the previously created `scripts` folder under `<environment-base-dir>/etc/platypush/scripts`. We now want to
add a new script (let's name it e.g. `webpage.py`) that is in charge of subscribing to new messages on `notebook/save`
and use the `http.webpage` integration to save its content in Markdown format in the repository folder. Once the parsed
file is in the right directory, the previously created automation will take care of synchronizing it to the git repo.
```python
import logging
import os
import re
import shutil
import tempfile
from datetime import datetime
from typing import Optional
from urllib.parse import quote
from platypush.event.hook import hook
from platypush.message.event.mqtt import MQTTMessageEvent
from platypush.procedure import procedure
from platypush.utils import run
logger = logging.getLogger('notebook')
repo_path = '/path/to/your/notebook/repo'
# Base URL for your Madness Markdown instance
markdown_base_url = 'https://my-host/'
@hook(MQTTMessageEvent, topic='notebook/save')
def on_notebook_url_save_request(event, **_):
"""
Subscribe to new messages on the notebook/save topic.
Such messages can contain either a URL to parse, or a
note to create - with specified content and title.
"""
url = event.msg.get('url')
content = event.msg.get('content')
title = event.msg.get('title')
save_link(url=url, content=content, title=title)
@procedure
def save_link(url: Optional[str] = None, title: Optional[str] = None, content: Optional[str] = None, **_):
assert url or content, 'Please specify either a URL or some Markdown content'
# Create a temporary file for the Markdown content
f = tempfile.NamedTemporaryFile(suffix='.md', delete=False)
if url:
logger.info(f'Parsing URL {url}')
# Parse the webpage to Markdown to the temporary file
response = run('http.webpage.simplify', url=url, outfile=f.name)
title = title or response.get('title')
# Sanitize title and filename
if not title:
title = f'Note created at {datetime.now()}'
title = title.replace('/', '-')
if content:
with open(f.name, 'w') as f:
f.write(content)
# Download the Markdown file to the repo
filename = re.sub(r'[^a-zA-Z0-9 \-_+,.]', '_', title) + '.md'
outfile = os.path.join(repo_path, filename)
shutil.move(f.name, outfile)
os.chmod(outfile, 0o660)
logger.info(f'URL {url} successfully downloaded to {outfile}')
# Send the URL
link_url = f'{markdown_base_url}/{quote(title)}'
run('pushbullet.send_note', title=title, url=link_url)
```
We now have a service that can listen for messages delivered on `notebook/save`. If the message contains some Markdown
content, it will directly save it to the notebook. If it contains a URL, it will use the `http.webpage` integration to
parse the web page and save it to the notebook. What we need now is a way to easily send messages to this channel while
we are browsing the web. A common use-case is the one where you are reading an article on your browser (either on a
computer or a mobile device) and you want to save it to your notebook to read it later through a mechanism similar to
the familiar _Share_ button. Let's break down this use-case in two:
- The desktop (or laptop) case
- The mobile case
### Sharing links from the desktop
If you are reading an article on your personal computer and you want to save it to your notebook (for example to read
it later on your mobile) then you can use the
[Platypush browser extension](https://git.platypush.tech/platypush/platypush-webext) to create a simple action that
sends your current tab to the `notebook/save` MQTT channel.
Download the extension on your browser ([Firefox version](https://addons.mozilla.org/en-US/firefox/addon/platypush/),
[Chrome version](https://chrome.google.com/webstore/detail/platypush/aphldjclndofhflbbdnmpejbjgomkbie)) - more
information about the Platypush browser extension is available in a
[previous article](https://blog.platypush.tech/article/One-browser-extension-to-rule-them-all). Then, click on the
extension icon in the browser and add a new connection to a Platypush host - it could either be your own machine or any
of the notebook clients you have configured.
Side note: the extension only works if the target Platypush machine has `backend.http` (i.e. the web server) enabled,
as it is used to dispatch messages over the Platypush API. This wasn't required by the previous set up, but you can now
select one of the devices to expose a web server by simply adding a `backend.http` section to the configuration file and
setting `enabled: True` (by default the web server will listen on the port 8008).
![Platypush web extension first screen](../img/extension-2.png)
![Platypush web extension second screen](../img/extension-3.png)
Then from the extension configuration panel select your host -> Run Action. Wait for the autocomplete bar to populate
(it may take a while the first time, since it has to inspect all the methods in all the enabled packages) and then
create a new `mqtt.publish` action that sends a message with the current URL over the `notebook/save` channel:
![URL save extension action](../img/self-hosted-notebook-extension-1.png)
Click on the _Save Action_ button at the bottom of the page, give your action a name and, optionally, an icon, a color
and a set of tags. You can also select a keybinding between Ctrl+Alt+0 and Ctrl+Alt+9 to automatically run your action
without having to grab the mouse.
Now browse to any web page that you want to save, run the action (either by clicking on the extension icon and
selecting it or through the keyboard shortcut) and wait a couple of seconds. You should soon receive a Pushbullet
notification with a link to the parsed content and the repo should get updated as well on all of your devices.
### Sharing links from mobile devices
An easy way to share links to your notebook through an Android device is to leverage
[Tasker](https://tasker.joaoapps.com/) with the [AutoShare](https://joaoapps.com/autoshare/what-it-is/) plugin, and
choose an app like [MQTT Client](https://play.google.com/store/apps/details?id=in.dc297.mqttclpro) that comes with a
Tasker integration. You may then create a new AutoShare intent named e.g. _Save URL_, create a Tasker task associated
to it that uses the MQTT Client integration to send the message with the URL to the right MQTT topic. When
you are browsing a web page that you'd like to save then you simply click on the _Share_ button and select
_AutoShare Command_ in the popup window, then select the action you have created.
However, even though I really appreciate the features provided by Tasker, its ecosystem and the developer behind it
(I have been using it for more than 10 years), I am on a path of moving more and more of my automation away from it.
Firstly, because it's a paid app with paid services, and the whole point of setting up this whole automation is to
have the same quality of a paid service without having to pay for - we host it, we own it. Secondly, it's not an
open-source app, and it's notably tricky to migrate configurations across devices.
Termux also provides a mechanism for [intents and hooks](https://wiki.termux.com/wiki/Intents_and_Hooks), and we can
easily create a sharing intent for the notebook by creating a script under `~/bin/termux-url-opener`. Make sure that
the binary file is executable and that you have `Termux:GUI` installed for support for visual widgets:
```bash
#!/data/data/com.termux/files/usr/bin/bash
arg="$1"
# termux-dialog-radio show a list of mutually exclusive options and returns
# the selection in JSON format. The options need to be provided over the -v
# argument and they are comma-separated
action=$(termux-dialog radio -t 'Select an option' -v 'Save URL,some,other,options' | jq -r '.text')
case "$action" in
'Save URL')
cat <<EOF | python
from platypush.utils import run
run('mqtt.publish', topic='notebook/save', msg={'url': '$arg'})
EOF
;;
# You can add some other actions here
esac
```
Now browse to a page that you want to save from your mobile device, tap the _Share_ button, select _Termux_ and select
the _Save URL_ option. Everything should work out of the box.
## Delivering RSS digests to your notebook
As a last step in our automation set up, let's consider the use-case where you want a digest of the new content from
your favourite source (your favourite newspaper, magazine, blog etc.) to be automatically delivered on a periodic basis
to your notebook in readable format.
It's relatively easy to set up such automation with the building blocks we have put in place and the Platypush
[`rss`](https://docs.platypush.tech/platypush/plugins/rss.html) integration. Add an `rss` section to the configuration
file of any of your clients with the `http.webpage` integration. It will contain the RSS sources you want to subscribe
to:
```yaml
rss:
subscriptions:
- https://source1.com/feed/rss
- https://source2.com/feed/rss
- https://source3.com/feed/rss
```
Then either rebuild the virtual environment (`platyvenv build -c config.yaml`) or manually install the required
dependency in the existing environment (`pip install feedparser`).
The RSS integration will trigger a
[`NewFeedEntryEvent`](https://docs.platypush.tech/platypush/events/rss.html#platypush.message.event.rss.NewFeedEntryEvent)
whenever an entry is added to an RSS feed you are subscribed to. We now want to create a logic that reacts to such
events and does the following:
1. Whenever a new entry is created on a subscribed feed, add the corresponding URL to a queue of links to process
2. A cronjob that runs on a specified basis will collect all the links in the queue, parse the content of the webpages
and save them in a `digests` folder on the notebook.
Create a new script under `$PREFIX/etc/platypush/scripts` named e.g. `digests.py`:
```python
import logging
import pathlib
import os
import tempfile
from datetime import datetime
from multiprocessing import RLock
from platypush.cron import cron
from platypush.event.hook import hook
from platypush.message.event.rss import NewFeedEntryEvent
from platypush.utils import run
from .notebook import repo_path
logger = logging.getLogger('digest-generator')
# Path to a text file where you'll store the processing queue
# for the feed entries - one URL per line
queue_path = '/path/to/feeds/processing/queue'
# Lock to ensure consistency when writing to the queue
queue_path_lock = RLock()
# The digests path will be a subfolder of the repo_path
digests_path = f'{repo_path}/digests'
@hook(NewFeedEntryEvent)
def on_new_feed_entry(event, **_):
"""
Subscribe to new RSS feed entry events and add the
corresponding URLs to a processing queue.
"""
with queue_path_lock:
with open(queue_path, 'a') as f:
f.write(event.url + '\n')
@cron('0 4 * * *')
def digest_generation_cron(**_):
"""
This cronjob runs every day at 4AM local time.
It processes all the URLs in the queue, it generates a digest
with the parsed content and it saves it in the notebook folder.
"""
logger.info('Running digest generation cronjob')
with queue_path_lock:
try:
with open(queue_path, 'r') as f:
md_files = []
for url in f:
# Create a temporary file for the Markdown content
tmp = tempfile.NamedTemporaryFile(suffix='.md', delete=False)
logger.info(f'Parsing URL {url}')
# Parse the webpage to Markdown to the temporary file
response = run('http.webpage.simplify', url=url, outfile=tmp.name)
title = response.get('title', url)
md_files.append(tmp.name)
except FileNotFoundError:
pass
if not md_files:
logger.info('No URLs to process')
return
try:
pathlib.Path(digests_path).mkdir(parents=True, exist_ok=True)
digest_file = os.path.join(digests_path, f'{datetime.now()}_digest')
digest_content = f'# Digest generated on {datetime.now()}\n\n'
for md_file in md_files:
with open(md_file, 'r') as f:
digest_content += f.read() + '\n\n'
with open(digest_file, 'w') as f:
f.write(digest_content)
# Clean up the queue
os.unlink(queue_path)
finally:
for md_file in md_files:
os.unlink(md_file)
```
Now restart the Platypush service. On the first start after configuring the `rss` integration it should trigger a bunch
of `NewFeedEntryEvent` with all the newly seen content from the subscribed feed. Once the cronjob runs, it will process
all these pending requests and it will generate a new digest in your notebook folder. Since we previously set up an
automation to monitor changes in this folder, the newly created file will trigger a git sync as well as broadcast sync
request on MQTT. At there you go - your daily or weekly subscriptions, directly delivered to your custom notebook!
## Conclusions
In this article we have learned:
1. How to design a distributed architecture to synchronize content across multiple devices using Platypush scripts as
the glue between a git repository and an MQTT broker.
2. How to manage a notebook based on Markdown and which popular options are available for the visualization -
Github/Gitlab, Obsidian, NextCloud Notes, Madness.
3. How to install a Platypush virtual environment on the fly from a configuration file through `platyvenv` command
(in the previous articles I mainly targeted manual installations). Just for you to know, a `platydock` command is
also available to create Docker containers on the fly from a configuration file, but given the hardware requirements
or specific dependency chains that some integrations may require the mileage of `platydock` may vary.
4. How to install and run Platypush directly on Android through Termux. This is actually quite huge: in this specific
article we targeted a use-case for folder synchronization between mobile and desktop, but given the high number of
integrations provided by Platypush, as well as the powerful scripts provided by `Termux:API`, it's relatively easy
to use Platypush to set up automations that replace the need of paid (and closed-source) services like Tasker.
5. How to use the `http.webpage` integration to distill web pages into readable Markdown.
6. How to push links to our automation chain through a desktop browser (using the Platypush browser extension) or mobile
(using the `termux-url-opener` mechanism).
7. How to use the `rss` integration to subscribe to feeds, and how to hook it to `http.webpage` and cronjobs to generate
periodic digests delivered to our notebook.
You should now have some solid tools to build your own automated notebook. A few ideas on possible follow-ups:
1. Use your notebook to manage databases (a feature provided by Notion) in CSV format.
2. Set up a similar distributed sync mechanism to synchronize photos across devices.
3. Host your own Markdown-based wiki or website built on top of such an automation pipeline, so on each update the
website is automatically refreshed with the new content.
Happy hacking!