blog/markdown/One-browser-extension-to-rule-them-all.md
Fabio Manganiello f34f1f6232 Refactored Platypush blog repo.
Removed all the Python logic + templates and styles.
Those have now been moved to a stand-alone project (madblog),
therefore this repo should only contain the static blog
pages and images.
2022-01-12 00:56:39 +01:00

515 lines
27 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[//]: # (title: One web extension to rule them all)
[//]: # (description: How to use the Platypush web extension to customize your browser and connect things together.)
[//]: # (image: /img/extension-1.png)
[//]: # (author: Fabio Manganiello <fabio@platypush.tech>)
[//]: # (published: 2020-07-07)
Once upon a time, there was a worldwide web where web extensions were still new toys to play with and the major browsers
that supported them (namely Firefox and Chrome) didnt mind providing them with very wide access to their internals and
APIs to do (more or less) whatever they pleased. The idea was that these browser add-ons/apps/extensions (the lines
between these were still quite blurry at the time) could become a powerful way to run within a browser (even locally and
without connecting to another website) any piece of software the user wanted to run.
It was an age when powerful extensions spawned that could also deeply change many things in the browser (like the
now-defunct [Vimperator](http://vimperator.org/) that could completely redesign the UI of the browser to make it look
and behave like vim), and
[user scripts](https://en.wikipedia.org/wiki/Userscript) were a powerful way users could leverage to run anything they
liked wherever they liked. I used to use Vimperator custom scripts a lot to map whichever sequence of keys I wanted to
whichever custom action I wanted — just modeled as plain JavaScript. And I used to use user scripts a lot, as well —
those still exist, but with many more limitations than before.
That age of wild West of web extensions and apps is largely gone by now. It didnt take long before malicious actors
realized that the freedom given to web extensions made them a perfect vector to run malware/spyware directly within the
browser that, in many cases, could bypass several anti-malware layers. And that generation of web extensions had another
issue with fragmentation. Firefox and Chrome had developed their own APIs (like
Mozillas [XUL](https://www.zdnet.com/article/mozilla-changes-firefox-apis-developers-unhappy/)
and [Chrome Apps](https://blog.chromium.org/2016/08/from-chrome-apps-to-web.html)) that didnt have much overlap. That
made the task of developing a web extension that targeted multiple browsers a very expensive experience, and many
extensions and apps were only available for a particular browser.
The case for greater security, separation of concerns, and less fragmentation drove the migration towards the modern
[WebExtension API](https://wiki.mozilla.org/WebExtensions). Around the end of 2017,
both [Mozilla](https://www.zdnet.com/article/mozilla-changes-firefox-apis-developers-unhappy/)
and [Google](https://blog.chromium.org/2016/08/from-chrome-apps-to-web.html) ended the support for the previous APIs in
the respective browsers. They also added more restrictions for the add-ons and scripts not approved on their stores (
recent versions of Firefox only allow you to permanently install extensions published on the store) and added more
constraints and checks in their review processes.
The new API has made it harder for malicious actors to hack a user through the browser, and it also has greatly reduced
the barriers required to develop a cross-browser extension. On the other hand, however, it has also greatly reduced the
degrees of freedom offered to extensions. Several extensions that required deep integration with the browser (like
Vimperator and Postman) decided to either migrate to stand-alone apps or just abandon their efforts. And user scripts
have become more niche geeky features with more limitations than before offered by third-party extensions like
[Greasemonkey](https://addons.mozilla.org/en-US/firefox/addon/greasemonkey/)/[Tampermonkey](https://chrome.google.com/webstore/detail/tampermonkey/dhdgffkkebhmkfjojejmpbldmpobfkfo?hl=en).
Firefoxs recent [user-scripts API](https://www.ghacks.net/2019/03/27/firefox-support-user-scripts-api/) is a
promising alternative for reviving the power of the past wave, but so far its only supported by Firefox.
---
As a power user, while I understand all the motivations that led browser developers to the decision of more
fencing/sandboxing for the extensions, I still miss those times when we could deeply customize our browser and what it
could do however we liked it. I built Platypush over the years to solve my need for endless extensibility and
customization on the backend side, with everything provided by a uniform and coherent API and platform. I thought that
applying the same philosophy to the context of my web browser would have been the natural next step. With the Platypush
web extension, Ive tried to build a solution for several needs faced by many power users.
First, weve got several backend solutions to run things around, and smart home devices to do things and pass
information around. But the dear ol desktop web browser has often been left behind in this progress in automation, even
if many people still spend a lot of time on the web through desktop devices. Most of the front-end solutions for
cloud/home automation come through mobile apps. Some of the solutions for automation provide a web app/panel (and
Platypush does it as well), but the web panel is receiving less and less attention in an increasingly mobile-centric
world.
And even when your solution provides a web app, theres another crucial factor to take into account: the time to action.
How much time passes between you thinking “Id like to run this action on that device” and the action actually being
executed on that device? And remember that, especially when it comes to smart devices, the time-to-action in the “smart”
way (like you toggling a light-bulb remotely) should never be longer than the time-to-action in the “dumb” way (like you
standing up and toggling a switch). Thats your baseline.
When Im doing some work on my laptop I may sometimes want to run some action on another device — like send a link to my
phone, turn on the lights or the fan, play the video currently playing on the laptop on my media center, play the
Spotify playlist playing in my bedroom in my living room — or the other way around — and so on. Sure, for some of these
problems theres a Platypush/HomeAssistant/OpenHAB/BigCorp Inc. front-end solution, but that usually involves either you
getting the hands off your laptop to grab your phone, or opening/switching to the tab with the web app provided by your
platform, searching for the right menu/option, scrolling a bit, and then running the action. Voice assistants are
another option (and Platypush [provides integrations](https://blog.platypush.tech/article/Build-custom-voice-assistants)
that give you access to many of the voice technologies around), but talking your way through the day to run anything
isnt yet the frictionless and fast process many want — nor it should be the only way. Minimizing the time-to-action for
me means to be able to run that action on the fly (ideally within a maximum of three clicks or keystrokes) from any tab
or from the toolbar itself, regardless of the action.
Sure, there are some web extensions to solve some of those problems. But that usually involves:
- Relying on someone elses solution for your problem, and that solution isnt necessarily the most optimal for your use
case.
- Polluting your browser with lots of extensions in order to execute different types of actions. Sending links to other
devices may involve installing the Pushbullet/Join extension, playing media on Kodi another extension, playing media
on the Chromecast another extension, saving links to Instapaper/Evernote/Pocket or other extensions, sharing on
Twitter/Facebook yet more extensions, controlling your smart home hub yet another extension… and the list goes on,
until your browsers toolbar is packed with icons, and you cant even recall what some of them do — defeating the
whole purpose of optimizing the time-to-action from the context of the web browser.
- And, of course, installing too many extensions increases the potential area of surface for attacks against your
browser — and thats the problem that the WebExtensions API was supposed to solve in the first place.
I first started this journey by building a simple web extension that I could use to quickly debug Platypush commands
executed on other RaspberryPis and smart devices around my house over web API/websocket/MQTT. Then, I realized that I
could use the same solution to solve my problem of optimizing the time-to-action — i.e. the problem of “I want to switch
on the lights right now without either grabbing my phone or switching tabs or standing up, while Im working on my
Medium article on the laptop.” And that means either from the toolbar itself (preferably with all the actions grouped
under the same extension button and UI) or through the right-click context menu, like a native browser action. The
ability to run any Platypush action from my browser on any remote device meant that I could control any device or remote
API from the same interface, as long as there is a Platypush plugin to interact with that device/API.
But that target wasnt enough for me yet. Not all the actions that I may want to run on the fly from whichever location
in the browser could be translated to an atomic Platypush action. Platypush remote procedures can surely help with
running more complex logic on the backend, but I wanted the extension to also cover my use cases that require
interaction with the browser context — things like “play this video on my Chromecast (yes, even if Im on Firefox)”,
“translate this page and make sure that the result doesnt look like a 1997 website (yes, even if Im on Firefox)”,
“download this Magnet link directly on my NAS”, and so on. All the way up to custom event hooks that could react to
Platypush events triggered by other devices with custom logic running in the browser — things like “synchronize the
clipboard on the laptop if another Platypush device sends
a [`ClipboardEvent`](https://docs.platypush.tech/en/latest/platypush/events/clipboard.html)”, “send a notification
to the browser with the spoken text when the Google Assistant plugin triggers
a [`ResponseEvent`](https://docs.platypush.tech/en/latest/platypush/events/assistant.html#platypush.message.event.assistant.ResponseEvent)”
, or when a sensor goes above a certain threshold, and so on.
I wanted the ability to define all of these actions through a JavaScript native API similar to that provided by
Greasemonkey/Tampermonkey. But while most of the user scripts provided by those extensions only run within the context
of a web page, I wanted to decouple my script snippets from the web page and build an API that provides access to both
the browser context, to the Platypush actions available on any other remote device, to run background code in response
to custom events, and to synchronize the configuration easily across devices. So lets briefly go through the extension
to see what you can do with it.
## Installation and usage
First, you need a Platypush service running somewhere. If you havent tried it before, refer to any of the links in the
previous sections to get started (Ive made sure that installing, configuring, and starting a base environment doesnt
take longer than five minutes, I promise :) ). Also, make sure that you enable
the [HTTP backend](https://docs.platypush.tech/en/latest/platypush/backend/http.html) in the `config.yaml`, as the
webserver is the channel used by the extension to communicate with the server. Once you have a Platypush instance
running on e.g. a RaspberryPi, another server or your laptop, get the web extension:
- [Firefox link](https://addons.mozilla.org/en-US/firefox/addon/platypush/)
- [Chrome link](https://chrome.google.com/webstore/detail/platypush/aphldjclndofhflbbdnmpejbjgomkbie?hl=en-GB&authuser=0)
You can also build an extension from sources. First, make sure that you have `npm` installed, then clone the repo:
```shell
git clone https://git.platypush.tech/platypush/platypush-webext
```
Install the dependencies and build the extension:
```shell
npm install
npm run build
```
At the end of the process, you should have a `dist` folder with a `manifest.json`.
- In Chrome (or any Chromium-based browser), go to *Extensions* -> *Load Unpacked* and select the dist folder.
- In Firefox, go to `about:debugging` -> *This Firefox* -> *Load Temporary Add-on* and select the `manifest.json` file.
Note that recent versions of
Firefox [only support](https://blog.mozilla.org/addons/2018/02/22/removing-support-unpacked-extensions/) unpacked
extensions (i.e. any extension not loaded on the Firefox add-ons website) through `about:debugging`. This means that any
temporary extension will be lost when the browser is restarted — however, restoring the configuration of the Platypush
extension when its reinstalled is a very quick process.
Once installed in the browser, the extension icon will appear in the toolbar.
![Web extension screenshot 1](../img/extension-2.png)
Click on the available link to open the extension configuration tab and add your Platypush device in the configuration.
![Web extension screenshot 2](../img/extension-3.png)
Once the device is added, click on its name from the menu and select *Run Action*.
![Web extension screenshot 3](../img/extension-4.png)
The run tab comes with two modes: *request* and *script* mode. In request mode, you can run actions directly on a remote
Platypush device through a dynamic interface. Youve got a form with an autocomplete menu that displays all the actions
available on your device, and upon selection, the form is pre-populated with all the arguments available for that
action, their default values, and description. This interface is very similar to the execute tab provided by the
Platypush web panel, and it makes it super easy to quickly test and run commands on another host.
You can use this interface to run any action on any remote device as long as theres a plugin installed and configured
for it — file system management, media center controls, voice assistants, cameras, switches, getting data from sensors,
managing cloud services, you name it. You can also run procedures stored on the remote device — their action names start
with `procedure` — and you can also pass the URL in the active tab to action as an argument by using the special
variable `$URL$` as an action value. For instance, you can use it to create an action that sends the current URL to your
mobile device
through [`pushbullet.send_note`](https://docs.platypush.tech/en/latest/platypush/plugins/pushbullet.html#platypush.plugins.pushbullet.PushbulletPlugin.send_note),
with both `body` and `url` set to `$URL$`. Once youre happy with your action, you can save it so its available both
from the toolbar and the browser context menu.
You can also associate keybindings to your actions, so you can run them in your browser from any tab with just a few
keystrokes. The mappings are in the form `<Ctrl><Alt><n>`, with `n` between `0` and `9` - however, Chrome-based
browsers limit the number of keybindings per extension to a maximum of 4, for some odd reason that I completely ignore.
If you only needed a way to execute Platypush actions remotely from your browser, this is actually all you need. The
action will now be available from the extension toolbar:
![Web extension screenshot 4](../img/extension-5.png)
And from the context menu:
![Web extension screenshot 5](../img/extension-6.png)
You can easily debug/edit stored action from the Stored Action tab in the extensions configuration page.
## Script mode
The other (and most powerful) way to define custom actions is through scripts. Scripts can be used to glue together the
Platypush API (or any other API) and the browser API.
Select Script from the selector on the top of the Run Action tab. You will be presented with a JavaScript editor with a
pre-loaded script template:
![Web extension screenshot 6](../img/extension-7.png)
The page also provides a [link to a Gist](https://gist.github.com/BlackLight/d80c571705215924abc06a80994fd5f4) showing
examples for all the available pieces of the API. In a nutshell, these are the most important pieces you can use to
build your user scripts:
- `args` includes relevant context information for your scripts, such as the target Platypush `host`, the `tabId`, and
the `target` element, if the action was called from a context menu on a page.
- `app` exposes the API available to the script.
Among the methods exposed by app:
- `app.getURL` returns the URL in the active tab.
- `app.setURL` changes the URL rendered in the active tab, while `app.openTab` opens a URL in a new tab.
- `app.notify(message, title)` displays a browser notification.
- `app.run` executes actions on a remote Platypush device.
For example, this is a possible action to cast YouTube videos to the default Chromecast device:
```javascript
// Platypush user script to play the current URL
// on the Chromecast if it is a YouTube URL.
async (app, args) => {
const url = await app.getURL();
if (!url.startsWith('https://www.youtube.com/watch?v=')) {
return;
}
const response = await app.run({
action: 'media.chromecast.play',
args: {
resource: url,
},
}, args.host);
if (response.success) {
app.notify('YouTube video now playing on Chromecast');
}
}
```
- `app.axios.[get|post|put|delete|patch|head|options]`: The API also exposes the Axios API to perform custom AJAX calls
to remote endpoints. For example, if you want to save the current URL to your Instapaper account:
```javascript
// Sample Platypush user script to save the current URL to Instapaper
async (app, args) => {
const url = await app.getURL();
const response = await app.axios.get('https://www.instapaper.com/api/add', {
params: {
url: url,
username: '********@****.***',
password: '******',
},
});
const targetURL = `https://instapaper.com/read/${response.data.bookmark_id}`;
app.openTab(targetURL);
}
```
- `app.getDOM` returns the DOM/content of the current page (as a Node element), while `app.setDOM` replaces the
DOM/content of the page (given as a string). For example, you can combine the provided DOM API with
the [Platypush Translate plugin](https://docs.platypush.tech/en/latest/platypush/plugins/google.translate.html)
to translate a web page on the fly:
```javascript
// Platypush user script to translate a web page through the Google Translate API
async (app, args) => {
const dom = await app.getDOM();
// Translate the page through the Platypush Google Translate plugin
// (https://docs.platypush.tech/en/latest/platypush/plugins/google.translate.html).
// The plugin also splits the HTML in multiple requests if too long
// to circumvent Google's limit on maximum input text.
const response = await app.run({
action: 'google.translate.translate',
args: {
text: dom.body.innerHTML,
format: 'html',
target_language: 'en',
}
}, args.host);
// The new body will contain a <div> with the translated HTML,
// a hidden <div> with the original HTML and a top fixed button
// to switch back to the original page.
const translatedDiv = `
<div class="platypush__translated-container">
<div class="platypush__translated-banner">
<button type="button"
onclick="document.body.innerHTML = document.querySelector('.platypush__original-body').innerHTML">
See Original
</button>
</div>
<div class="platypush__translated-body">
${response.translated_text}
</div>
<div class="platypush__original-body">
${dom.body.innerHTML}
</div>
</div>`;
const style = `
<style>
.platypush__original-body {
display: none;
}
.platypush__translated-banner {
position: fixed;
background: white;
color: black;
z-index: 99999;
}
</style>`;
// Reconstruct the DOM and change it.
dom.head.innerHTML += style;
dom.body.innerHTML = translatedDiv;
await app.setDOM(`<html>${dom.getElementsByTagName('html')[0].innerHTML}</html>`);
}
```
- The extension API also exposes the [Mercury Reader API](https://github.com/postlight/mercury-parser) to
simplify/distill the content of a web page. You can combine the elements seen so far into a script that simplifies the
content of a web page for better readability or to make it more printer-friendly:
```javascript
// Platypush sample user script to simplify/distill the content of a web page
async (app, args) => {
const url = await app.getURL();
// Get and parse the page body through the Mercury API
const dom = await app.getDOM();
const html = dom.body.innerHTML;
const response = await app.mercury.parse(url, html);
// Define a new DOM that contains the simplified body as well as
// the original body as a hidden <div>, and provide a top fixed
// button to switch back to the original content.
const style = `
<style>
.platypush__simplified-body {
max-width: 30em;
margin: 0 auto;
font-family: Helvetica, Arial, sans-serif;
font-size: 20px;
color: #333;
}
.platypush__simplified-body h1 {
border-bottom: 1px solid black;
margin-bottom: 1em;
padding-top: 1em;
}
.platypush__original-body {
display: none;
}
.platypush__simplified-banner {
position: fixed;
top: 0.5em;
left: 0.5em;
background: white;
color: black;
z-index: 99999;
}
</style>`;
const simplifiedDiv = `
<div class="platypush__simplified-container">
<div class="platypush__simplified-banner">
<a href="#"
onclick="document.body.innerHTML = document.querySelector('.platypush__original-body').innerHTML; return false;">
See Original
</a>
</div>
<div class="platypush__simplified-body">
<h1>${response.title}</h1>
${response.content}
</div>
<div class="platypush__original-body">
${dom.body.innerHTML}
</div>
</div>`;
// Construct and replace the DOM
dom.head.innerHTML += style;
dom.body.innerHTML = simplifiedDiv;
await app.setDOM(`<html>${dom.getElementsByTagName('html')[0].innerHTML}</html>`);
}
```
- Finally, you can access the `target` element if you run the action through a context menu (for example, right-click on
an item on the page). Because of WebExtensions API limitations (which can only pass JSON-serializable objects around),
the `target` element is passed on the args as a string, but you can easily convert it to a DOM object (and you can
convert any HTML to DOM) through the `app.HTML2DOM` method. For example, you can extend the initial YouTube to
Chromecast user script to cast any audio or video item present on a page:
```javascript
// Sample Platypush user script to cast the current tab or any media item selected
// on the page to the default Chromecast device configured in Platypush.
async (app, args) => {
const baseURL = await app.getURL();
// Default URL to cast: current page URL
let url = baseURL;
if (args.target) {
// The user executed the action from a context menu
const target = app.HTML2DOM(args.target);
// If it's a <video> or <audio> HTML element then it will have a <source> tag
const src = target.querySelector('source');
// Otherwise, check if it's a link
const link = target.tagName.toLowerCase() === 'a' ? target : target.querySelector('a');
if (src) {
url = src.attributes.src.value;
} else if (link) {
url = link.attributes.href.value;
}
}
// Resolve relative URLs
if (!url.match('^https?://[^/]+') && !url.startsWith('magnet:?')) {
url = baseURL.match('(^https?://[^/]+)')[1] + (url || '');
}
// Finally, cast the media
await app.run({
action: 'media.chromecast.play',
args: {
resource: url,
}
}, args.host);
}
```
With these basic blocks, you should be able to create any custom browser actions that you want. Some examples:
- Convert the current web page to PDF through
the [Platypush `webpage.simplify` plugin](https://docs.platypush.tech/en/latest/platypush/plugins/http.webpage.html)
and deliver it to your Kindle as an attachment through
the [Platypush GMail plugin](https://docs.platypush.tech/en/latest/platypush/plugins/google.mail.html).
- Send an email to someone containing the text selected on a page.
- Translate on the fly some text selected on a page.
- Share the current link to Twitter/Facebook/LinkedIn (and ditch away all the other share-to-social extensions).
- Download magnet/torrent links on a page directly to your NAS.
## Configuration Backup and Restore
Finally, you can easily edit, back up, and restore the extension configuration through the *Configuration* tab. The
configuration can either be loaded/copied to a file, restored from/backed up remotely to a Platypush device (look ma,
no cloud!), or loaded from a URL.
![Web extension screenshot 7](../img/extension-8.png)
## Work in Progress
The extension is still under development, and Im open to suggestions, tickets, and pull requests on the
[Gitlab page](https://git.platypush.tech/platypush/platypush-webext). Two features, in particular, are next on my
roadmap:
### Integration with the Platypush WebSocket protocol
That would allow many interesting features, such as checking the health status of the remote devices, transferring
larger chunks of data (like the audio/video stream from the remote device) and, most of all, setting up event hooks —
scripts that run automatically when a Platypush device triggers an event like e.g. voice assistant response processed,
media status changed, lights scene changed, new sensor data is received, etc.
### Support for third-party libraries through the API
As of now, the scripts API exposes the `axios` and `mercury` parser libraries, but I believe that for sake of
flexibility, it should be possible to import external libraries in the user scripts through something as simple as:
```javascript
const jquery = app.loadScript('https://some.cdn/jquery.min.js');
```
Eventually, if the project gets enough traction, Id love to create a repository where users could share their creations
— as long as we all keep in mind that with great power comes great responsibility :)
## Conclusions
I understand the security and uniformity concerns that led to the adoption of the WebExtensions API. But I also
understand why many extensions that used to rely on deeper integrations with the browser have refused to compromise with
the new API and have been discontinued in the meantime.
I struggled a lot myself developing this extension with the new API, as the current WebExtensions API creates many
sandboxes and it only makes pieces of information accessible to a particular context (like the background script, the
content script, or the popup context), and it forces developers that need information and features to be accessed by
multiple parts of their application to set up complex messaging systems to pass data around. It also puts hard
constraints on what pieces of code can be executed where (good luck trying to get away with an evalin your code). With
the Platypush web extension, I have tried to fill the void left by the deprecation of the previous
extensions/apps/add-ons APIs and to provide a layer for power users to deeply customize the behavior of their browsers.
I also wanted to build an extension that could give me easy access, from the same UI and the same button, to the
increasingly fragmented world of smart devices around me. And I had also grown tired of seeing the space on my toolbar
being eaten up by tons of icons of extensions that could only perform one specific task!