Fabio Manganiello
f34f1f6232
Removed all the Python logic + templates and styles. Those have now been moved to a stand-alone project (madblog), therefore this repo should only contain the static blog pages and images.
300 lines
16 KiB
Markdown
300 lines
16 KiB
Markdown
[//]: # (title: Build your customizable voice assistant with Platypush)
|
||
[//]: # (description: Use the available integrations to build a voice assistant with a simple microphone)
|
||
[//]: # (image: /img/voice-assistant-1.png)
|
||
[//]: # (author: Fabio Manganiello <fabio@platypush.tech>)
|
||
[//]: # (published: 2019-09-05)
|
||
|
||
My dream of a piece of software that you could simply talk to and get things done started more than 10 years ago, when I
|
||
was still a young M.Sc student who imagined getting common tasks done on my computer through the same kind of natural
|
||
interaction you see between Dave and [HAL 9000](https://en.wikipedia.org/wiki/HAL_9000) in
|
||
[2001: A Space Odyssey](https://en.wikipedia.org/wiki/2001:_A_Space_Odyssey_(film)). Together with a friend I developed
|
||
[Voxifera](https://github.com/BlackLight/Voxifera) way back in 2008. Although the software worked well enough for basic
|
||
tasks, as long as it was always me to provide the voice commands and as long as the list of custom voice commands was
|
||
below 10 items, Google and Amazon in the latest years have gone way beyond what an M.Sc student alone could do with
|
||
fast-Fourier transforms and Markov models.
|
||
|
||
When years later I started building [Platypush](https://git.platypush.tech/platypush/platypush), I still dreamed of the
|
||
same voice interface, leveraging the new technologies, while not being caged by the interactions natively provided by
|
||
those commercial assistants. My goal was still to talk to my assistant and get it to do whatever I wanted to, regardless
|
||
of the skills/integrations supported by the product, regardless of whichever answer its AI was intended to provide for
|
||
that phrase. And, most of all, my goal was to have all the business logic of the actions to run on my own device(s), not
|
||
on someone else’s cloud. I feel like by now that goal has been mostly accomplished (assistant technology with 100%
|
||
flexibility when it comes to phrase patterns and custom actions), and today I’d like to show you how to set up your own
|
||
Google Assistant on steroids as well with a Raspberry Pi, microphone and Platypush. I’ll also show how to run your
|
||
custom hotword detection models through the [Snowboy](https://snowboy.kitt.ai/) integration, for those who wish greater flexibility when it comes to
|
||
how to summon your digital butler besides the boring “Ok Google” formula, or those who aren’t that happy with the idea
|
||
of having Google to constantly listen to everything that is said in the room. For those who are unfamiliar with
|
||
Platypush, I suggest
|
||
reading [my previous article](https://blog.platypush.tech/article/Ultimate-self-hosted-automation-with-Platypush) on
|
||
what it is, what it can do, why I built it and how to get started with it.
|
||
|
||
## Context and expectations
|
||
|
||
First, a bit of context around the current state of the assistant integration (and the state of the available assistant APIs/SDKs in general).
|
||
|
||
My initial goal was to have a voice assistant that could:
|
||
|
||
1. Continuously listen through an audio device for a specific audio pattern or phrase and process the subsequent voice
|
||
requests.
|
||
|
||
2. Support multiple models for the hotword, so that multiple phrases could be used to trigger a request process, and
|
||
optionally one could even associate a different assistant language to each hotword.
|
||
|
||
3. Support conversation start/end actions even without hotword detection — something like “start listening when I press
|
||
a button or when I get close to a distance sensor”.
|
||
|
||
4. Provide the possibility to configure a list of custom phrases or patterns (ideally
|
||
through [regular expressions](https://en.wikipedia.org/wiki/Regular_expression)) that, when matched, would run a
|
||
custom pre-configured task or list of tasks on the executing device, or on any device connected through it.
|
||
|
||
5. If a phrase doesn’t match any of those pre-configured patterns, then the assistant would go on and process the
|
||
request in the default way (e.g. rely on Google’s “how’s the weather?” or “what’s on my calendar?” standard response).
|
||
|
||
Basically, I needed an assistant SDK or API that could be easily wrapped into a library or tiny module, a module that could listen for hotwords, start/stop conversations programmatically, and return the detected phrase directly back to my business logic if any speech was recognized.
|
||
|
||
I eventually decided to develop the integration with the Google Assistant and ignore Alexa because:
|
||
|
||
- Alexa’s original [sample app](https://github.com/alexa/alexa-avs-sample-app.git) for developers was a relatively heavy
|
||
piece of software that relied on a Java backend and a Node.js web service.
|
||
|
||
- In the meantime Amazon has pulled the plug off that original project.
|
||
|
||
- The sample app has been replaced by the [Amazon AVS (Alexa Voice Service)](https://github.com/alexa/avs-device-sdk),
|
||
which is a C++ service mostly aimed to commercial applications and doesn’t provide a decent quickstart for custom
|
||
Python integrations.
|
||
|
||
- There are [few Python examples for the Alexa SDK](https://developer.amazon.com/en-US/alexa/alexa-skills-kit/alexa-skill-python-tutorial#sample-python-projects),
|
||
but they focus on how to develop a skill. I’m not interested in building a skill that runs on Amazon’s servers — I’m
|
||
interested in detecting hotwords and raw speech on any device, and the SDK should let me do whatever I want with that.
|
||
|
||
I eventually opted for
|
||
the [Google Assistant library](https://developers.google.com/assistant/sdk/guides/library/python/), but that
|
||
has [recently been deprecated with short notice](https://github.com/googlesamples/assistant-sdk-python/issues/356), and
|
||
there’s an ongoing discussion of which will be the future alternatives. However, the voice integration with Platypush
|
||
still works, and whichever new SDK/API Google will release in the near future I’ll make sure that it’ll still be
|
||
supported. The two options currently provided are:
|
||
|
||
- If you’re running Platypush on an x86/x86_64 machine or on a Raspberry Pi earlier than the model 4 (except for the
|
||
Raspberry Pi Zero, since it’s based on ARM6 and the Assistant library wasn’t compiled it for it), you can still use
|
||
the assistant library — even though it’s not guaranteed to work against future builds of the libc, given the
|
||
deprecated status of the library.
|
||
|
||
- Otherwise, you can use the Snowboy integration for hotword detection together with Platypush’ s wrapper around the
|
||
Google push-to-talk sample for conversation support.
|
||
|
||
In this article we’ll see how to get started with both the configurations.
|
||
|
||
## Installation and configuration
|
||
|
||
First things first: in order to get your assistant working you’ll need:
|
||
|
||
- An x86/x86_64/ARM device/OS compatible with Platypush and either the Google Assistant library or Snowboy (tested on
|
||
most of the Raspberry Pi models, Banana Pis and Odroid, and on ASUS Tinkerboard).
|
||
|
||
- A microphone. Literally any Linux-compatible microphone would work.
|
||
|
||
I’ll also assume that you have already installed Platypush on your device — the instructions are provided on
|
||
the [Github page](https://git.platypush.tech/platypush/platypush), on
|
||
the [wiki](https://git.platypush.tech/platypush/platypush/-/wikis/home#installation) and in
|
||
my [previous article](https://blog.platypush.tech/article/Ultimate-self-hosted-automation-with-Platypush).
|
||
|
||
Follow these steps to get the assistant running:
|
||
|
||
- Install the required dependencies:
|
||
|
||
```shell
|
||
# To run the Google Assistant hotword service + speech detection
|
||
# (it won't work on RaspberryPi Zero and arm6 architecture)
|
||
[sudo] pip install 'platypush[google-assistant-legacy]'
|
||
|
||
# To run the just the Google Assistant speech detection and use
|
||
# Snowboy for hotword detection
|
||
[sudo] pip install 'platypush[google-assistant]'
|
||
```
|
||
|
||
- Follow [these steps](https://developers.google.com/assistant/sdk/guides/service/python/embed/config-dev-project-and-account)
|
||
to create and configure a new project in the Google Console and download the required credentials
|
||
files.
|
||
|
||
- Generate your user’s credentials file for the assistant to connect it to your account:
|
||
|
||
```shell
|
||
export CREDENTIALS_FILE=~/.config/google-oauthlib-tool/credentials.json
|
||
|
||
google-oauthlib-tool --scope https://www.googleapis.com/auth/assistant-sdk-prototype \
|
||
--scope https://www.googleapis.com/auth/gcm \
|
||
--save --headless --client-secrets $CREDENTIALS_FILE
|
||
```
|
||
|
||
- Open the prompted URL in your browser, log in with your Google account if needed and then enter the prompted
|
||
authorization code in the terminal.
|
||
|
||
The above steps are common both for the Assistant library and the Snowboy+push-to-talk configurations. Let’s now tackle
|
||
how to get things working with the Assistant library, provided that it still works on your device.
|
||
|
||
### Google Assistant library
|
||
|
||
- Enable the Google Assistant backend (to listen to the hotword) and plugin (to programmatically start/stop
|
||
conversations in your custom actions) in your Platypush configuration file (by default
|
||
`~/.config/platypush/config.yaml`):
|
||
|
||
```yaml
|
||
backend.assistant.google:
|
||
enabled: True
|
||
|
||
assistant.google:
|
||
enabled: True
|
||
```
|
||
|
||
- Refer to the official documentation to check the additional initialization parameters and actions provided by the
|
||
[assistant backend](https://docs.platypush.tech/en/latest/platypush/backend/assistant.google.html) and
|
||
[plugin](https://docs.platypush.tech/en/latest/platypush/plugins/assistant.google.html).
|
||
|
||
- Restart Platypush and keep an eye on the output to check that everything is alright. Oh, and also double check that
|
||
your microphone is not muted.
|
||
|
||
- Just say “OK Google” or “Hey Google”. The basic assistant should work out of the box.
|
||
|
||
### Snowboy + Google Assistant library
|
||
|
||
Follow the steps in the next section if the Assistant library doesn’t work on your device (in most of the cases you’ll
|
||
see a segmentation fault if you try to import it caused by a mismatching libc version), or if you want more options when
|
||
it comes to supported hotwords, and/or you don’t like the idea of having Google to constantly listen all of your
|
||
conversation to detect when you say the hotword.
|
||
|
||
```shell
|
||
# Install the Snowboy dependencies
|
||
[sudo] pip install 'platypush[hotword]'
|
||
```
|
||
|
||
- Go to the [Snowboy home page](https://snowboy.kitt.ai/), register/login and then select the hotword model(s) you like.
|
||
You’ll notice that before downloading a model you’ll be asked to provide three voice sample of yours saying the
|
||
hotword — a good idea to keep voice models free while getting everyone to improve them.
|
||
|
||
- Configure the Snowboy backend and the Google push-to-talk plugin in your Platypush configuration. Example:
|
||
|
||
```yaml
|
||
backend.assistant.snowboy:
|
||
audio_gain: 1.0
|
||
models:
|
||
computer:
|
||
voice_model_file: ~/path/models/computer.umdl
|
||
assistant_plugin: assistant.google.pushtotalk
|
||
assistant_language: it-IT
|
||
detect_sound: ~/path/sounds/sound1.wav
|
||
sensitivity: 0.45
|
||
|
||
ok_google:
|
||
voice_model_file: ~/path/models/OK Google.pmdl
|
||
assistant_plugin: assistant.google.pushtotalk
|
||
assistant_language: en-US
|
||
detect_sound: ~/path/sounds/sound2.wav
|
||
sensitivity: 0.42
|
||
|
||
assistant.google.pushtotalk:
|
||
language: en-US
|
||
```
|
||
|
||
A few words about the configuration tweaks:
|
||
|
||
- Tweak `audio_gain` to adjust the gain of your microphone (1.0 for a 100% gain).
|
||
|
||
- `model` will contain a key-value list of the voice models that you want to use.
|
||
|
||
- For each model you’ll have to specify its `voice_model_file` (downloaded from the Snowboy website), which
|
||
`assistant_plugin` will be used (`assistant.google.pushtotalk` in this case), the assistant_language code, i.e. the
|
||
selected language for the assistant conversation when that hotword is detected (default: `en-US`), an optional
|
||
detect_sound, a WAV file that will be played when a conversation starts, and the sensitivity of that model, between 0
|
||
and 1 — with 0 meaning no sensitivity and 1 very high sensitivity (tweak it to your own needs, but be aware that a
|
||
value higher than 0.5 might trigger more false positives).
|
||
|
||
- The `assistant.google.pushtotalk` plugin configuration only requires the default assistant language to be used.
|
||
|
||
Refer to the official documentation for extra initialization parameters and methods provided by the
|
||
[Snowboy backend](https://docs.platypush.tech/en/latest/platypush/backend/assistant.snowboy.html) and the
|
||
[push-to-talk plugin](https://docs.platypush.tech/en/latest/platypush/plugins/assistant.google.pushtotalk.html).
|
||
|
||
Restart Platypush and check the logs for any errors, then say your hotword. If everything went well, an assistant
|
||
conversation will be started when the hotword is detected.
|
||
|
||
## Create custom events on speech detected
|
||
|
||
So now that you’ve got the basic features of the assistant up and running, it’s time to customize the configuration and
|
||
leverage the versatility of Platypush to get your assistant to run whatever you like through when you say whichever
|
||
phrase you like. You can create event hooks for any of the events triggered by the assistant — among those,
|
||
`SpeechRecognizedEvent`, `ConversationStartEvent`, `HotwordDetectedEvent`, `TimerEndEvent` etc., and those hooks can run
|
||
anything that has a Platypush plugin. Let’s see an example to turn on your Philips Hue lights when you say “turn on the
|
||
lights”:
|
||
|
||
```yaml
|
||
event.hook.AssistantTurnLightsOn:
|
||
if:
|
||
type: platypush.message.event.assistant.SpeechRecognizedEvent
|
||
phrase: "turn on (the)? lights?"
|
||
then:
|
||
- action: light.hue.on
|
||
```
|
||
|
||
You’ll also notice that the answer of the assistant is suppressed if the detected phrase matches an existing rule, but
|
||
if you still want the assistant to speak a custom phrase you can use the `tts` or `tts.google plugins`:
|
||
|
||
```yaml
|
||
event.hook.AssistantTurnOnLightsAnimation:
|
||
if:
|
||
type: platypush.message.event.assistant.SpeechRecognizedEvent
|
||
phrase: "turn on (the)? animation"
|
||
then:
|
||
- action: light.hue.animate
|
||
args:
|
||
animation: color_transition
|
||
transition_seconds: 0.25
|
||
|
||
- action: tts.say
|
||
args:
|
||
text: Enjoy the light show
|
||
```
|
||
|
||
You can also programmatically start a conversation without using the hotword to trigger the assistant. For example, this
|
||
is a rule that triggers the assistant whenever you press a Flic button:
|
||
|
||
```yaml
|
||
event.hook.FlicButtonStartConversation:
|
||
if:
|
||
type: platypush.message.event.button.flic.FlicButtonEvent
|
||
btn_addr: 00:11:22:33:44:55
|
||
sequence:
|
||
- ShortPressEvent
|
||
then:
|
||
- action: assistant.google.start_conversation
|
||
# or:
|
||
# - action: assistant.google.pushtotalk.start_conversation
|
||
```
|
||
|
||
Additional win: if you have configured the HTTP backend and you have access to the web panel or the dashboard then
|
||
you’ll notice that the status of the conversation will also appear on the web page as a modal dialog, where you’ll see
|
||
when a hotword has been detected, the recognized speech and the transcript of the assistant response.
|
||
|
||
That’s all you need to know to customize your assistant — now you can for instance write rules that would blink your
|
||
lights when an assistant timer ends, or programmatically play your favourite playlist on mpd/mopidy when you say a
|
||
particular phrase, or handle a home made multi-room music setup with Snapcast+platypush through voice commands. As long
|
||
as there’s a platypush plugin to do what you want to do, you can do it already.
|
||
|
||
## Live demo
|
||
|
||
A [TL;DR video](https://photos.app.goo.gl/mCscTDFcB4SzazeK7) with a practical example:
|
||
|
||
In this video:
|
||
|
||
- Using Google Assistant basic features ("how's the weather?") with the "OK Google" hotword (in English)
|
||
|
||
- Triggering a conversation in Italian when I say the "computer" hotword instead
|
||
|
||
- Support for custom responses through the Text-to-Speech plugin
|
||
|
||
- Control the music through custom hooks that leverage mopidy as a backend (and synchronize music with devices in other rooms through the Snapcast plugin)
|
||
|
||
- Trigger a conversation without hotword - in this case I defined a hook that starts a conversation when something approaches a distance sensor on my Raspberry
|
||
|
||
- Take pictures from a camera on another Raspberry and preview them on the screen through platypush' camera plugins, and send them to mobile devices through the Pushbullet or AutoRemote plugins
|
||
|
||
- All the conversations and responses are visually shown on the platypush web dashboard
|