68 lines
5.2 KiB
Markdown
68 lines
5.2 KiB
Markdown
|
[//]: # (title: Build your customizable voice assistant with Platypush)
|
|||
|
[//]: # (description: Use the available integrations to build a voice assistant with a simple microphone)
|
|||
|
[//]: # (image: /img/voice-assistant-1.png)
|
|||
|
[//]: # (published: 2019-09-05)
|
|||
|
|
|||
|
My dream of a piece of software that you could simply talk to and get things done started more than 10 years ago, when I
|
|||
|
was still a young M.Sc student who imagined getting common tasks done on my computer through the same kind of natural
|
|||
|
interaction you see between Dave and [HAL 9000](https://en.wikipedia.org/wiki/HAL_9000) in
|
|||
|
[2001: A Space Odyssey](https://en.wikipedia.org/wiki/2001:_A_Space_Odyssey_(film)). Together with a friend I developed
|
|||
|
[Voxifera](https://github.com/BlackLight/Voxifera) way back in 2008. Although the software worked well enough for basic
|
|||
|
tasks, as long as it was always me to provide the voice commands and as long as the list of custom voice commands was
|
|||
|
below 10 items, Google and Amazon in the latest years have gone way beyond what an M.Sc student alone could do with
|
|||
|
fast-Fourier transforms and Markov models.
|
|||
|
|
|||
|
When years later I started building [Platypush](https://git.platypush.tech/platypush/platypush), I still dreamed of the
|
|||
|
same voice interface, leveraging the new technologies, while not being caged by the interactions natively provided by
|
|||
|
those commercial assistants. My goal was still to talk to my assistant and get it to do whatever I wanted to, regardless
|
|||
|
of the skills/integrations supported by the product, regardless of whichever answer its AI was intended to provide for
|
|||
|
that phrase. And, most of all, my goal was to have all the business logic of the actions to run on my own device(s), not
|
|||
|
on someone else’s cloud. I feel like by now that goal has been mostly accomplished (assistant technology with 100%
|
|||
|
flexibility when it comes to phrase patterns and custom actions), and today I’d like to show you how to set up your own
|
|||
|
Google Assistant on steroids as well with a Raspberry Pi, microphone and platypush. I’ll also show how to run your
|
|||
|
custom hotword detection models through the [Snowboy](https://snowboy.kitt.ai/) integration, for those who wish greater flexibility when it comes to
|
|||
|
how to summon your digital butler besides the boring “Ok Google” formula, or those who aren’t that happy with the idea
|
|||
|
of having Google to constantly listen to everything that is said in the room. For those who are unfamiliar with
|
|||
|
platypush, I suggest
|
|||
|
reading [my previous article](https://blog.platypush.tech/article/Ultimate-self-hosted-automation-with-Platypush) on
|
|||
|
what it is, what it can do, why I built it and how to get started with it.
|
|||
|
|
|||
|
## Context and expectations
|
|||
|
|
|||
|
First, a bit of context around the current state of the assistant integration (and the state of the available assistant APIs/SDKs in general).
|
|||
|
|
|||
|
My initial goal was to have a voice assistant that could:
|
|||
|
|
|||
|
1. Continuously listen through an audio device for a specific audio pattern or phrase and process the subsequent voice
|
|||
|
requests.
|
|||
|
|
|||
|
2. Support multiple models for the hotword, so that multiple phrases could be used to trigger a request process, and
|
|||
|
optionally one could even associate a different assistant language to each hotword.
|
|||
|
|
|||
|
3. Support conversation start/end actions even without hotword detection — something like “start listening when I press
|
|||
|
a button or when I get close to a distance sensor”.
|
|||
|
|
|||
|
4. Provide the possibility to configure a list of custom phrases or patterns (ideally
|
|||
|
through [regular expressions](https://en.wikipedia.org/wiki/Regular_expression)) that, when matched, would run a
|
|||
|
custom pre-configured task or list of tasks on the executing device, or on any device connected through it.
|
|||
|
|
|||
|
5. If a phrase doesn’t match any of those pre-configured patterns, then the assistant would go on and process the
|
|||
|
request in the default way (e.g. rely on Google’s “how’s the weather?” or “what’s on my calendar?” standard response).
|
|||
|
|
|||
|
Basically, I needed an assistant SDK or API that could be easily wrapped into a library or tiny module, a module that could listen for hotwords, start/stop conversations programmatically, and return the detected phrase directly back to my business logic if any speech was recognized.
|
|||
|
|
|||
|
I eventually decided to develop the integration with the Google Assistant and ignore Alexa because:
|
|||
|
|
|||
|
- Alexa’s original [sample app](https://github.com/alexa/alexa-avs-sample-app.git) for developers was a relatively heavy
|
|||
|
piece of software that relied on a Java backend and a Node.js web service.
|
|||
|
|
|||
|
- In the meantime Amazon has pulled the plug off that original project.
|
|||
|
|
|||
|
- The sample app has been replaced by the [Amazon AVS (Alexa Voice Service)](https://github.com/alexa/avs-device-sdk),
|
|||
|
which is a C++ service mostly aimed to commercial applications and doesn’t provide a decent quickstart for custom
|
|||
|
Python integrations.
|
|||
|
|
|||
|
- There are [few Python examples for the Alexa SDK](https://developer.amazon.com/en-US/alexa/alexa-skills-kit/alexa-skill-python-tutorial#sample-python-projects),
|
|||
|
but they focus on how to develop a skill. I’m not interested in building a skill that runs on Amazon’s servers — I’m
|
|||
|
interested in detecting hotwords and raw speech on any device, and the SDK should let me do whatever I want with that.
|