Migrated 9th article

2021-01-31 13:07:59 +01:00 · 2021-01-31 13:07:59 +01:00 · 2c7ce5e5c9
commit 2c7ce5e5c9
parent b846b928c6
2 changed files with 742 additions and 0 deletions
--- a/static/img/voice-1.jpg
+++ b/static/img/voice-1.jpg
--- a/static/pages/Build-custom-voice-assistants.md
+++ b/static/pages/Build-custom-voice-assistants.md
@ -0,0 +1,742 @@
+[//]: # (title: Build custom voice assistants)
+[//]: # (description: An overview of the current technologies and how to leverage Platypush to build your customized assistant.)
+[//]: # (image: /img/voice-1.jpg)
+[//]: # (author: Fabio Manganiello <fabio@platypush.tech>)
+[//]: # (published: 2020-03-08)
+
+I wrote [an article](https://blog.platypush.tech/article/Build-your-customizable-voice-assistant-with-Platypush) a while
+ago that describes how to make your own Google-based voice assistant using just a RaspberryPi, Platypush, a speaker and
+a microphone.
+
+It also showed how to make your own custom hotword model that triggers the assistant if you don’t want to say “Ok
+Google,” or if you want distinct hotwords to trigger different assistants in different languages. It also showed how to
+hook your own custom logic and scripts when certain phrases are recognized, without writing any code.
+
+Since I wrote that article, a few things have changed:
+
+- When I wrote the article, Platypush only supported the Google Assistant as a voice back end. In the meantime, I’ve
+  worked on [supporting Alexa as well](https://github.com/BlackLight/platypush/issues/80). Feel free to use the
+  `assistant.echo` integration in Platypush if you’re an Alexa fan, but bear in mind that it’s more limited than the
+  existing Google Assistant based options — there are limitations in the AVS (Amazon Voice Service). For example, it
+  won’t provide the transcript of the detected text, which means it’s not possible to insert custom hooks or the
+  transcript of the rendered response because the AVS mostly works with audio files as input and provides audio as
+  output. It could also experience some minor audio glitches, at least on RasbperryPi.
+
+- Although deprecated, a new release of the Google Assistant
+  Library [has been made available](https://github.com/googlesamples/assistant-sdk-python/releases/tag/0.6.0) to fix the
+  segmentation fault issue on RaspberryPi 4. I’ve buzzed the developers often over the past year and I’m glad that it’s
+  been done! It’s good news because the Assistant library has the best engine for hotword detection I’ve seen. No other
+  SDK I’ve tried — Snowboy, DeepSpeech, or PicoVoice — comes close to the native “Ok Google” hotword detection accuracy
+  and performance. The news isn’t all good, however: The library is still deprecated, with no alternative is currently
+  on the horizon. The new release was mostly made in response to user requests to fix things on the new RaspberryPi. But
+  at least one of the best options out there to build a voice assistant will still work for a while. Those interested in
+  building a custom voice assistant that acts 100% like a native Google Assistant can read my previous article.
+
+- In the meantime, the shaky situation of the official voice assistant SDK has motivated me to research more
+  state-of-art alternatives. I’ve been a long-time fan of [Snowboy](https://snowboy.kitt.ai/), which has a
+  well-supported platypush integration, and I’ve used it as a hotword engine to trigger other assistant integrations for
+  a long time. However, when it comes to accuracy in real-time scenarios, even its best models aren’t that satisfactory.
+  I’ve also experimented with
+  [Mozilla DeepSpeech](https://github.com/mozilla/DeepSpeech) and [PicoVoice](https://github.com/Picovoice) products,
+  for voice detection and built integrations in Platypush. In this article, I’ll try to provide a comprehensive overview
+  of what’s currently possible with DIY voice assistants and a comparison of the integrations I’ve built.
+
+- **EDIT January 2021**: Unfortunately, as of Dec 31st,
+  2020 [Snowboy has been officially shut down](https://github.com/Kitt-AI/snowboy/). The GitHub repository is still
+  there, you can still clone it and either use the example models provided under `resources/models`, train a model
+  using the Python API or use any of your previously trained model. However, the repo is no longer maintained, and the
+  website that could be used to browse and generate user models is no longer available. It's really a shame - the user
+  models provided by Snowboy were usually quite far from perfect, but it was a great example of crowd-trained
+  open-source project, and it just shows how difficult it is to keep such projects alive without anybody funding the
+  time invested by the developers in them. Anyway, most of the Snowboy examples reported in this article will still work
+  if you download and install the code from the repo.
+
+## The Case for DIY Voice Assistants
+
+Why would anyone bother to build their own voice assistant when cheap Google or Alexa assistants can be found anywhere? Despite how pervasive these products have become, I decided to power my whole house with several DIY assistants for a number of reasons:
+
+- **Privacy**. The easiest one to guess! I’m not sure if a microphone in the house, active 24/7, connected to a private
+  company through the internet is a proportionate price to pay for between five and ten interactions a day to toggle the
+  lightbulbs, turn on the thermostat, or play a Spotify playlist. I’ve built the voice assistant integrations in
+  platypush with the goal of giving people the option of voice-enabled services without sending all of the daily voice
+  interactions over a privately-owned channel through a privately-owned box.
+
+- **Compatibility**. A Google Assistant device will only work with devices that support Google Assistant. The same goes
+  for Alexa-powered devices. Some devices may lose some of their voice-enabled capabilities — either temporarily,
+  depending on the availability of the cloud connections, or permanently, because of hardware or software deprecation or
+  other commercial factors. My dream voice assistant works natively with any device, as long as it has an SDK or API to
+  interact with, and does not depend on business decisions.
+
+- **Flexibility**. Even when a device works with your assistant, you’re still bound to the features that have been
+  agreed and implemented by the two parties. Implementing more complex routines over voice commands is usually tricky.
+  In most cases, it involves creating code that will run on the cloud (either in the form of Actions or Lambdas, or
+  IFTTT rules), not in your own network, which limits the actual possibilities. My dream assistant must have the ability
+  to run whichever logic I want on whichever device I want, using whichever custom shortcut I want (even with regex
+  matching), regardless of the complexity. I also aimed to build an assistant that can provide multiple services (
+  Google, Alexa, Siri etc.) in multiple languages on the same device, simply by using different hotwords.
+
+- **Hardware constraints**. I’ve never understood the case for selling plastic boxes that embed a microphone and a speaker
+  in order to enter the world of voice services. That was a good way to showcase the idea. After a couple of years of
+  experiments, it’s probably time to expect the industry to provide a voice assistant experience that can run on any
+  device, as long as it has a microphone and a controller unit that can process code. As for compatibility, there should
+  be no case for Google-compatible or Alexa-compatible devices. Any device should be compatible with any assistant, as
+  long as that device has a way to communicate with the outside world. The logic to control that device should be able
+  to run on the same network that the device belongs to.
+
+- **Cloud vs. local processing**. Most of the commercial voice assistants operate by regularly capturing streams of
+  audio, scanning for the hotword in the audio chunks through their cloud -provided services, and opening another
+  connection to their cloud services once the hotword is detected, to parse the speech and to provide the response. In
+  some cases, even the hotword detection is, at least partly, run in the cloud. In other words, most of the voice
+  assistants are dumb terminals intended to communicate with cloud providers that actually do most of the job, and they
+  exchange a huge amount of information over the internet in order to operate. This may be sensible when your targets
+  are low-power devices that operate within a fast network and you don’t need much flexibility. But if you can afford to
+  process the audio on a more capable CPU, or if you want to operate on devices with limited connectivity, or if you
+  want to do things that you usually can’t do with off-the-shelf solutions, you may want to process as much as possible
+  of the load on your device. I understand the case for a cloud-oriented approach when it comes to voice assistants but,
+  regardless of the technology, we should always be provided with a choice between decentralized and centralized
+  computing. My dream assistant must have the ability to run the hotword and speech detection logic either on-device or
+  on-cloud, depending on the use case and depending on the user’s preference.
+
+- **Scalability**. If I need a new voice assistant in another room or house, I just grab a RaspberryPi, flash the copy
+  of my assistant-powered OS image to the SD card, plug in a microphone and a speaker, and it’s done. Without having to
+  buy a new plastic box. If I need a voice-powered music speaker, I just take an existing speaker and plug it into a
+  RaspberryPi. If I need a voice-powered display, I just take an existing display and plug it to a RaspberryPi. If I
+  need a voice-powered switch, I just write a rule for controlling it on voice command directly on my RaspberryPi,
+  without having to worry about whether it’s supported in my Google Home or Alexa app. Any device should be given the
+  possibility of becoming a smart device.
+
+## Overview of the voice assistant integrations
+
+A voice assistant usually consists of two components:
+
+- An **audio recorder** that captures frames from an audio input device
+- A **speech engine** that keeps track of the current context.
+
+There are then two main categories of speech engines: hotword detectors, which scan the audio input for the presence of
+specific hotwords (like “Ok Google” or “Alexa”), and speech detectors, which instead do proper speech-to-text
+transcription using acoustic and language models. As you can imagine, continuously running a full speech detection has a
+far higher overhead than just running hotword detection, which only has to compare the captured speech against the,
+usually short, list of stored hotword models. Then there are speech-to-intent engines, like PicoVoice’s Rhino. Instead
+of providing a text transcription as output, these provide a structured breakdown of the speech intent. For example, if
+you say *“Can I have a small double-shot espresso with a lot of sugar and some milk”* they may return something like `{"
+type":"espresso", “size”:”small", “numberOfShots":2, “sugar":"a lot", “milk":"some"}`).
+
+In Platypush, I’ve built integrations to provide users with a wide choice when it comes to speech-to-text processors and
+engines. Let’s go through some of the available integrations, and evaluate their pros and cons.
+
+## Native Google Assistant library
+
+### Integrations
+
+- [`assistant.google`](https://platypush.readthedocs.io/en/latest/platypush/plugins/assistant.google.html) plugin (to
+  programmatically start/stop conversations)
+  and [`assistant.google`](https://platypush.readthedocs.io/en/latest/platypush/backend/assistant.google.html) backend
+  (for continuous hotword detection).
+
+### Configuration
+
+- Create a Google project and download the `credentials.json` file from
+  the [Google developers console](https://console.cloud.google.com/apis/credentials).
+
+- Install the `google-oauthlib-tool`:
+
+```shell
+[sudo] pip install --upgrade 'google-auth-oauthlib[tool]'
+```
+
+- Authenticate to use the `assistant-sdk-prototype` scope:
+
+```shell
+export CREDENTIALS_FILE=~/.config/google-oauthlib-tool/credentials.json
+
+google-oauthlib-tool --scope https://www.googleapis.com/auth/assistant-sdk-prototype \
+      --scope https://www.googleapis.com/auth/gcm \
+      --save --headless --client-secrets $CREDENTIALS_FILE
+```
+
+- Install Platypush with the HTTP backend and Google Assistant library support:
+
+```shell
+[sudo] pip install 'platypush[http,google-assistant-legacy]'
+```
+
+- Create or add the lines to `~/.config/platypush/config.yaml` to enable the webserver and the assistant integration:
+
+```yaml
+backend.http:
+    enabled: True
+    
+backend.assistant.google:
+    enabled: True
+    
+assistant.google:
+    enabled: True
+```
+
+- Start Platypush, say “Ok Google” and enjoy your assistant. On the web panel on `http://your-rpi:8008` you should be
+  able to see your voice interactions in real-time.
+
+### Features
+
+- *Hotword detection*: **YES** (“Ok Google” or “Hey Google).
+- *Speech detection*: **YES** (once the hotword is detected).
+- *Detection runs locally*: **NO** (hotword detection [seems to] run locally, but once it's detected a channel is open
+  with Google servers for the interaction).
+
+### Pros
+
+- It implements most of the features that you’d find in any Google Assistant products. That includes native support for
+  timers, calendars, customized responses on the basis of your profile and location, native integration with the devices
+  configured in your Google Home, and so on. For more complex features, you’ll have to write your custom platypush hooks
+  on e.g. speech detected or conversation start/end events.
+
+- Both hotword detection and speech detection are rock solid, as they rely on the Google cloud capabilities.
+
+- Good performance even on older RaspberryPi models (the library isn’t available for the Zero model or other arm6-based
+  devices though), because most of the processing duties actually happen in the cloud. The audio processing thread takes
+  around 2–3% of the CPU on a RaspberryPi 4.
+
+### Cons
+
+- The Google Assistant library used as a backend by the integration has
+  been [deprecated by Google](https://developers.google.com/assistant/sdk/reference/library/python). It still works on
+  most of the devices I’ve tried, as long as the latest version is used, but keep in mind that it’s no longer maintained
+  by Google and it could break in the future. Unfortunately, I’m still waiting for an official alternative.
+
+- If your main goal is to operate voice-enabled services within a secure environment with no processing happening on
+  someone else’s cloud, then this is not your best option. The assistant library makes your computer behave more or less
+  like a full Google Assistant device, included capturing audio and sending it to Google servers for processing and,
+  potentially, review.
+
+## Google Assistant Push-To-Talk Integration
+
+### Integrations
+
+- [`assistant.google.pushtotalk`](https://platypush.readthedocs.io/en/latest/platypush/plugins/assistant.google.pushtotalk.html)
+  plugin.
+
+### Configuration
+
+- Create a Google project and download the `credentials.json` file from
+  the [Google developers console](https://console.cloud.google.com/apis/credentials).
+
+- Install the `google-oauthlib-tool`:
+
+```shell
+[sudo] pip install --upgrade 'google-auth-oauthlib[tool]'
+```
+
+- Authenticate to use the `assistant-sdk-prototype` scope:
+
+```shell
+export CREDENTIALS_FILE=~/.config/google-oauthlib-tool/credentials.json
+
+google-oauthlib-tool --scope https://www.googleapis.com/auth/assistant-sdk-prototype \
+      --scope https://www.googleapis.com/auth/gcm \
+      --save --headless --client-secrets $CREDENTIALS_FILE
+```
+
+- Install Platypush with the HTTP backend and Google Assistant SDK support:
+
+```shell
+[sudo] pip install 'platypush[http,google-assistant]'
+```
+
+- Create or add the lines to `~/.config/platypush/config.yaml` to enable the webserver and the assistant integration:
+
+```yaml
+backend.http:
+    enabled: True
+    
+assistant.google.pushtotalk:
+    language: en-US
+```
+
+- Start Platypush. Unlike the native Google library integration, the push-to-talk plugin doesn’t come with a hotword
+  detection engine. You can initiate or end conversations programmatically through e.g. Platypush event hooks,
+  procedures, or through the HTTP API:
+
+```shell
+curl -XPOST -H 'Content-Type: application/json' -d '
+{
+    "type":"request",
+    "action":"assistant.google.pushtotalk.start_conversation"
+}' -a 'username:password' http://your-rpi:8008/execute
+```
+
+### Features
+
+- *Hotword detection*: **NO** (call `start_conversation` or `stop_conversation` from your logic or from the context of a
+  hotword integration like Snowboy, DeepSpeech or PicoVoice to trigger or stop the assistant).
+
+- *Speech detection*: **YES**.
+
+- *Detection runs locally*: **NO** (you can customize the hotword engine and how to trigger the assistant, but once a
+  conversation is started a channel is opened with Google servers).
+
+### Pros
+
+- It implements many of the features you’d find in any Google Assistant product out there, even though hotword detection
+  isn’t available and some of the features currently available on the assistant library aren’t provided (like timers or
+  alarms).
+
+- Rock-solid speech detection, using the same speech model used by Google Assistant products.
+
+- Relatively good performance even on older RaspberryPi models. It’s also available for arm6 architecture, which makes
+  it suitable also for RaspberryPi Zero or other low-power devices. No hotword engine running means that it uses
+  resources only when you call `start_conversation`.
+
+- It provides the benefits of the Google Assistant speech engine with no need to have a 24/7 open connection between
+  your mic and Google’s servers. The connection is only opened upon `start_conversation`. This makes it a good option if
+  privacy is a concern, or if you want to build more flexible assistants that can be triggered through different hotword
+  engines (or even build assistants that are triggered in different languages depending on the hotword that you use), or
+  assistants that aren’t triggered by a hotword at all — for example, you can call start_conversation upon button press,
+  motion sensor event or web call.
+
+### Cons
+
+- I’ve built this integration after the deprecation of the Google Assistant library occurred with no official
+  alternatives being provided. I’ve built it by refactoring the poorly refined code provided by Google in its samples (
+  [`pushtotalk.py`](https://github.com/googlesamples/assistant-sdk-python/blob/master/google-assistant-sdk/googlesamples/assistant/grpc/pushtotalk.py))
+  and making a proper plugin out of it. It works, but keep in mind that it’s based on some ugly code that’s waiting to
+  be replaced by Google.
+
+- No hotword support. You’ll have to hook it up to Snowboy, PicoVoice or DeepSpeech if you want hotword support.
+
+## Alexa Integration
+
+### Integrations
+
+- [`assistant.echo`](https://platypush.readthedocs.io/en/latest/platypush/plugins/assistant.echo.html) plugin.
+
+### Configuration
+
+- Install Platypush with the HTTP backend and Alexa support:
+
+```shell
+[sudo] pip install 'platypush[http,alexa]'
+```
+
+- Run `alexa-auth`. It will start a local web server on your machine on `http://your-rpi:3000`. Open it in your browser
+  and authenticate with your Amazon account. A credentials file should be generated under `~/.avs.json`.
+
+- Create or add the lines to your `~/.config/platypush/config.yaml` to enable the webserver and the assistant
+  integration:
+
+```yaml
+backend.http:
+    enabled: True
+    
+assistant.echo:
+    enabled: True
+```
+
+- Start Platypush. The Alexa integration doesn’t come with a hotword detection engine. You can initiate or end
+  conversations programmatically through e.g. Platypush event hooks, procedures, or through the HTTP API:
+
+```shell
+curl -XPOST -H 'Content-Type: application/json' -d '
+{
+    "type":"request",
+    "action":"assistant.echo.start_conversation"
+}' -a 'username:password' http://your-rpi:8008/execute
+```
+
+### Features
+
+- *Hotword detection*: **NO** (call `start_conversation` or `stop_conversation` from your logic or from the context of a
+  hotword integration like Snowboy or PicoVoice to trigger or stop the assistant).
+
+- *Speech detection*: **YES** (although limited: transcription of the processed audio won’t be provided).
+
+- *Detection runs locally*: **NO**.
+
+### Pros
+
+- It implements many of the features that you’d find in any Alexa product out there, even though hotword detection isn’t
+  available. Also, the support for skills or media control may be limited.
+
+- Good speech detection capabilities, although inferior to the Google Assistant when it comes to accuracy.
+
+- Good performance even on low-power devices. No hotword engine running means it uses resources only when you call
+  start_conversation.
+
+- It provides some of the benefits of an Alexa device but with no need for a 24/7 open connection between your mic and
+  Amazon’s servers. The connection is only opened upon start_conversation.
+
+### Cons
+
+- The situation is extremely fragmented when it comes to Alexa voice SDKs. Amazon eventually re-released the AVS (Alexa
+  Voice Service), mostly with commercial uses in mind, but its features are still quite limited compared to the Google
+  assistant products. The biggest limitation is the fact that the AVS works on raw audio input and spits back raw audio
+  responses. It means that text transcription, either for the request or the response, won’t be available. That limits
+  what you can build with it. For example, you won’t be able to capture custom requests through event hooks.
+
+- No hotword support. You’ll have to hook it up to Snowboy, PicoVoice or DeepSpeech if you want hotword support.
+
+## Snowboy Integration
+
+### Integrations
+
+- [`assistant.snowboy`](https://platypush.readthedocs.io/en/latest/platypush/backend/assistant.snowboy.html) backend.
+
+### Configuration
+
+- Install Platypush with the HTTP backend and Snowboy support:
+
+```shell
+[sudo] pip install 'platypush[http,snowboy]'
+```
+
+- Choose your hotword model(s). Some are available under `SNOWBOY_INSTALL_DIR/resources/models`. Otherwise, you can
+  train or download models from the [Snowboy website](https://snowboy.kitt.ai/).
+
+- Create or add the lines to your `~/.config/platypush/config.yaml` to enable the webserver and the assistant
+  integration:
+
+```yaml
+backend.http:
+    enabled: True
+    
+backend.assistant.snowboy:
+    audio_gain: 1.2
+    models:
+        # Trigger the Google assistant in Italian when I say "computer"
+        computer:
+            voice_model_file: ~/models/computer.umdl
+            assistant_plugin: assistant.google.pushtotalk
+            assistant_language: it-IT
+            detect_sound: ~/sounds/bell.wav
+            sensitivity: 0.4
+
+        # Trigger the Google assistant in English when I say "OK Google"
+        ok_google:
+            voice_model_file: ~/models/OK Google.pmdl
+            assistant_plugin: assistant.google.pushtotalk
+            assistant_language: en-US
+            detect_sound: ~/sounds/bell.wav
+            sensitivity: 0.4
+
+        # Trigger Alexa when I say "Alexa"
+        alexa:
+            voice_model_file: ~/models/Alexa.pmdl
+            assistant_plugin: assistant.echo
+            assistant_language: en-US
+            detect_sound: ~/sounds/bell.wav
+            sensitivity: 0.5
+```
+
+- Start Platypush. Say the hotword associated with one of your models, check on the logs that the
+  [`HotwordDetectedEvent`](https://platypush.readthedocs.io/en/latest/platypush/events/assistant.html#platypush.message.event.assistant.HotwordDetectedEvent)
+  is triggered and, if there’s an assistant plugin associated with the hotword, the corresponding assistant is correctly
+  started.
+
+### Features
+
+- *Hotword detection*: **YES**.
+- *Speech detection*: **NO**.
+- *Detection runs locally*: **YES**.
+
+### Pros
+
+- I've been an early fan and supporter of the Snowboy project. I really like the idea of crowd-powered machine learning.
+  You can download any hotword models for free from their website, provided that you record three audio samples of you
+  saying that word in order to help improve the model. You can also create your custom hotword model, and if enough
+  people are interested in using it then they’ll contribute with their samples, and the model will become more robust
+  over time. I believe that more machine learning projects out there could really benefit from this “use it for free as
+  long as you help improve the model” paradigm.
+
+- Platypush was an early supporter of Snowboy, so its integration is well-supported and extensively documented. You can
+  natively configure custom assistant plugins to be executed when a certain hotword is detected, making it easy to make
+  a multi-language and multi-hotword voice assistant.
+
+- Good performance, even on low-power devices. I’ve used Snowboy in combination with the Google Assistant push-to-talk
+  integration for a while on single-core RaspberryPi Zero devices, and the CPU usage from hotword processing never
+  exceeded 20–25%.
+
+- The hotword detection runs locally, on models that are downloaded locally. That means no need for a network connection
+  to run and no data exchanged with any cloud.
+
+### Cons
+
+- Even though the idea of crowd-powered voice models is definitely interesting and has plenty of potentials to scale up,
+  the most popular models on their website have been trained with at most 2000 samples. And (sadly as well as
+  expectedly) most of those voice samples belong to white, young-adult males, which makes many of these models perform
+  quite poorly with speech recorded from any individuals that don’t fit within that category (and also with people who
+  aren’t native English speakers).
+
+## Mozilla DeepSpeech
+
+### Integrations
+
+- [`stt.deepspeech`](https://platypush.readthedocs.io/en/latest/platypush/plugins/stt.deepspeech.html) plugin
+  and [`stt.deepspeech`](https://platypush.readthedocs.io/en/latest/platypush/backend/stt.deepspeech.html) backend (for
+  continuous detection).
+
+### Configuration
+
+- Install Platypush with the HTTP backend and Mozilla DeepSpeech support. Take note of the version of DeepSpeech that
+  gets installed:
+
+```shell
+[sudo] pip install 'platypush[http,deepspeech]'
+```
+
+- Download the Tensorflow model files for the version of DeepSpeech that has been installed. This may take a while
+  depending on your connection:
+
+```shell
+export MODELS_DIR=~/models
+export DEEPSPEECH_VERSION=0.6.1
+
+wget https://github.com/mozilla/DeepSpeech/releases/download/v$DEEPSPEECH_VERSION/deepspeech-$DEEPSPEECH_VERSION-models.tar.gz
+
+tar xvf deepspeech-$DEEPSPEECH_VERSION-models.tar.gz
+x deepspeech-0.6.1-models/
+x deepspeech-0.6.1-models/lm.binary
+x deepspeech-0.6.1-models/output_graph.pbmm
+x deepspeech-0.6.1-models/output_graph.pb
+x deepspeech-0.6.1-models/trie
+x deepspeech-0.6.1-models/output_graph.tflite
+
+mv deepspeech-$DEEPSPEECH_VERSION-models $MODELS_DIR
+```
+
+- Create or add the lines to your `~/.config/platypush/config.yaml` to enable the webserver and the DeepSpeech
+  integration:
+
+```yaml
+backend.http:
+    enabled: True
+    
+stt.deepspeech:
+    model_file: ~/models/output_graph.pbmm
+    lm_file: ~/models/lm.binary
+    trie_file: ~/models/trie
+
+    # Custom list of hotwords
+    hotwords:
+        - computer
+        - alexa
+        - hello
+
+    conversation_timeout: 5
+          
+backend.stt.deepspeech:
+    enabled: True
+```
+
+- Start Platypush. Speech detection will start running on startup.
+  [`SpeechDetectedEvents`](https://platypush.readthedocs.io/en/latest/platypush/events/stt.html#platypush.message.event.stt.SpeechDetectedEvent)
+  will be triggered when you talk.
+  [`HotwordDetectedEvents`](https://platypush.readthedocs.io/en/latest/platypush/events/stt.html#platypush.message.event.stt.HotwordDetectedEvent)
+  will be triggered when you say one of the configured hotwords.
+  [`ConversationDetectedEvents`](https://platypush.readthedocs.io/en/latest/platypush/events/stt.html#platypush.message.event.stt.ConversationDetectedEvent)
+  will be triggered when you say something after a hotword, with speech provided as an argument. You can also disable the
+  continuous detection and only start it programmatically by calling `stt.deepspeech.start_detection` and
+  `stt.deepspeech.stop_detection`. You can also use it to perform offline speech transcription from audio files:
+
+```shell
+curl -XPOST -H 'Content-Type: application/json' -d '
+{
+    "type":"request",
+    "action":"stt.deepspeech.detect",
+    "args": {
+        "audio_file": "~/audio.wav"
+    }
+}' -a 'username:password' http://your-rpi:8008/execute
+
+# Example response
+{
+    "type":"response",
+    "target":"http",
+    "response": {
+        "errors":[],
+        "output": {
+            "speech": "This is a test"
+        }
+    }
+}
+```
+
+### Features
+
+- *Hotword detection*: **YES**.
+- *Speech detection*: **YES**.
+- *Detection runs locally*: **YES**.
+
+### Pros
+
+- I’ve been honestly impressed by the features of DeepSpeech and the progress they’ve made starting from the version
+  0.6.0. Mozilla made it easy to run both hotword and speech detection on-device with no need for any third-party
+  services or network connection. The full codebase is open-source and the Tensorflow voice and language models are also
+  very good. It’s amazing that they’ve released the whole thing for free to the community. It also means that you can
+  easily extend the Tensorflow model by training it with your own samples.
+  
+- Speech-to-text transcription of audio files can be a very useful feature.
+
+### Cons
+
+- DeepSpeech is quite demanding when it comes to CPU resources. It will run OK on a laptop or on a RaspberryPi 4 (but in
+  my tests it took 100% of a core on a RaspberryPi 4 for speech detection),. It may be too resource-intensive to run on
+  less powerful machines.
+
+- DeepSpeech has a bit more delay than other solutions. The engineers at Mozilla have worked a lot to make the model as
+  small and performant as possible, and they claim of having achieved real-time performance on a RaspberryPi 4. In
+  reality, all of my tests bear between 2 and 4 seconds of delay between speech capture and detection.
+
+- DeepSpeech is relatively good at detecting speech, but not at interpreting the semantic context (that’s something
+  where Google still wins hands down). If you say “this is a test,” the model may actually capture “these is a test.”
+  “This” and “these” do indeed sound almost the same in English, but the Google assistant has a better semantic engine
+  to detect the right interpretation of such ambiguous cases. DeepSpeech works quite well for speech-to-text
+  transcription purposes but, in such ambiguous cases, it lacks some semantic context.
+
+- Even though it’s possible to use DeepSpeech from Platypush as a hotword detection engine, keep in mind that it’s not
+  how the engine is intended to be used. Hotword engines usually run against smaller and more performant models only
+  intended to detect one or few words, not against a full-featured language model. The best usage of DeepSpeech is
+  probably either for offline text transcription, or with another hotword integration and leveraging DeepSpeech for the
+  speech detection part.
+
+## PicoVoice
+
+[PicoVoice](https://github.com/Picovoice/) is a very promising company that has released several products for performing
+voice detection on-device. Among them:
+
+- [*Porcupine*](https://github.com/Picovoice/porcupine), a hotword engine.
+- [*Leopard*](https://github.com/Picovoice/leopard), a speech-to-text offline transcription engine.
+- [*Cheetah*](https://github.com/Picovoice/cheetah), a speech-to-text engine for real-time applications.
+- [*Rhino*](https://github.com/Picovoice/rhino), a speech-to-intent engine.
+
+So far, Platypush provides integrations with Porcupine and Cheetah.
+
+### Integrations
+
+- *Hotword engine*:
+  [`stt.picovoice.hotword`](https://platypush.readthedocs.io/en/latest/platypush/plugins/stt.picovoice.hotword.html)
+  plugin and
+  [`stt.picovoice.hotword`](https://platypush.readthedocs.io/en/latest/platypush/backend/stt.picovoice.hotword.html)
+  backend (for continuous detection).
+
+- *Speech engine*:
+  [`stt.picovoice.speech`](https://platypush.readthedocs.io/en/latest/platypush/plugins/stt.picovoice.speech.html)
+  plugin and
+  [`stt.picovoice.speech`](https://platypush.readthedocs.io/en/latest/platypush/backend/stt.picovoice.speech.html)
+  backend (for continuous detection).
+
+### Configuration
+
+- Install Platypush with the HTTP backend and the PicoVoice hotword integration and/or speech integration:
+
+```shell
+[sudo] pip install 'platypush[http,picovoice-hotword,picovoice-speech]'
+```
+
+- Create or add the lines to your `~/.config/platypush/config.yaml` to enable the webserver and the DeepSpeech
+  integration:
+
+```yaml
+stt.picovoice.hotword:
+    # Custom list of hotwords
+    hotwords:
+        - computer
+        - alexa
+        - hello
+        
+# Enable continuous hotword detection
+backend.stt.picovoice.hotword:
+    enabled: True
+  
+# Enable continuous speech detection
+# backend.stt.picovoice.speech:
+#     enabled: True
+
+# Or start speech detection when a hotword is detected
+event.hook.OnHotwordDetected:
+    if:
+        type: platypush.message.event.stt.HotwordDetectedEvent
+    then:
+        # Start a timer that stops the detection in 10 seconds
+        - action: utils.set_timeout
+          args:
+              seconds: 10
+              name: StopSpeechDetection
+              actions:
+                  - action: stt.picovoice.speech.stop_detection
+
+        - action: stt.picovoice.speech.start_detection
+```
+
+- Start Platypush and enjoy your on-device voice assistant.
+
+### Features
+
+- *Hotword detection*: **YES**.
+- *Speech detection*: **YES**.
+- *Detection runs locally*: **YES**.
+
+### Pros
+
+- When it comes to on-device voice engines, PicoVoice products are probably the best solution out there. Their hotword
+  engine is far more accurate than Snowboy and it manages to be even less CPU-intensive. Their speech engine has much
+  less delay than DeepSpeech and it’s also much less power-hungry — it will still run well and with low latency even on
+  older models of RaspberryPi.
+
+### Cons
+
+- While PicoVoice provides Python SDKs, their native libraries are closed source. It means that I couldn’t dig much into
+  how they’ve solved the problem.
+
+- Their hotword engine (Porcupine) can be installed and run free of charge for personal use on any device, but if you
+  want to expand the set of keywords provided by default, or add more samples to train the existing models, then you’ll
+  have to go for a commercial license. Their speech engine (Cheetah) instead can only be installed and run free of
+  charge for personal use on Linux on x86_64 architecture. Any other architecture or operating system, as well as any
+  chance to extend the model or use a different model, is only possible through a commercial license. While I understand
+  their point and their business model, I’d have been super-happy to just pay for a license through a more friendly
+  process, instead of relying on the old-fashioned “contact us for a commercial license/we’ll reach back to you”
+  paradigm.
+
+- Cheetah’s speech engine still suffers from some of the issues of DeepSpeech when it comes to semantic context/intent
+  detection. The “this/these” ambiguity also happens here. However, these problems can be partially solved by using
+  Rhino, PicoVoice’s speech-to-intent engine, which will provide a structured representation of the speech intent
+  instead of a letter-by-letter transcription. However, I haven’t yet worked on integrating Rhino into platypush.
+
+## Conclusions
+
+The democratization of voice technology has long been dreamed about, and it’s finally (slowly) coming. The situation out
+there is still quite fragmented though and some commercial SDKs may still get deprecated with short notice or no notice
+at all. But at least some solutions are emerging to bring speech detection to all devices.
+
+I’ve built integrations in Platypush for all of these services because I believe that it’s up to users, not to
+businesses, to decide how people should use and benefit from voice technology. Moreover, having so many voice
+integrations in the same product — and especially having voice integrations that expose all the same API and generate
+the same events — makes it very easy to write assistant-agnostic logic, and really decouple the tasks of speech
+recognition from the business logic that can be run by voice commands.
+
+Check out
+[my previous article](https://blog.platypush.tech/article/Build-your-customizable-voice-assistant-with-Platypush) to
+learn how to write your own custom hooks in Platypush on speech detection, hotword detection and speech start/stop
+events.
+
+To summarize my findings so far:
+
+- Use the native **Google Assistant** integration if you want to have a full Google experience, and if you’re ok with
+  Google servers processing your audio and the possibility that somewhere in the future the deprecated Google Assistant
+  library won’t work anymore.
+
+- Use the **Google push-to-talk** integration if you only want to have the assistant, without hotword detection, or you
+  want your assistant to be triggered by alternative hotwords.
+
+- Use the **Alexa** integration if you already have an Amazon-powered ecosystem and you’re ok with having less
+  flexibility when it comes to custom hooks because of the unavailability of speech transcript features in the AVS.
+
+- Use **Snowboy** if you want to use a flexible, open-source and crowd-powered engine for hotword detection that runs
+  on-device and/or use multiple assistants at the same time through different hotword models, even if the models may not
+  be that accurate.
+
+- Use **Mozilla DeepSpeech** if you want a fully on-device open-source engine powered by a robust Tensorflow model, even
+  if it takes more CPU load and a bit more latency.
+
+- Use **PicoVoice** solutions if you want a full voice solution that runs on-device and it’s both accurate and
+  performant, even though you’ll need a commercial license for using it on some devices or extend/change the model.
+
+Let me know your thoughts on these solutions and your experience with these integrations!