reading [my previous article](https://blog.platypush.tech/article/Ultimate-self-hosted-automation-with-Platypush) on
what it is, what it can do, why I built it and how to get started with it.
## Context and expectations
First, a bit of context around the current state of the assistant integration (and the state of the available assistant APIs/SDKs in general).
My initial goal was to have a voice assistant that could:
1. Continuously listen through an audio device for a specific audio pattern or phrase and process the subsequent voice
requests.
2. Support multiple models for the hotword, so that multiple phrases could be used to trigger a request process, and
optionally one could even associate a different assistant language to each hotword.
3. Support conversation start/end actions even without hotword detection — something like “start listening when I press
a button or when I get close to a distance sensor”.
4. Provide the possibility to configure a list of custom phrases or patterns (ideally
through [regular expressions](https://en.wikipedia.org/wiki/Regular_expression)) that, when matched, would run a
custom pre-configured task or list of tasks on the executing device, or on any device connected through it.
5. If a phrase doesn’t match any of those pre-configured patterns, then the assistant would go on and process the
request in the default way (e.g. rely on Google’s “how’s the weather?” or “what’s on my calendar?” standard response).
Basically, I needed an assistant SDK or API that could be easily wrapped into a library or tiny module, a module that could listen for hotwords, start/stop conversations programmatically, and return the detected phrase directly back to my business logic if any speech was recognized.
I eventually decided to develop the integration with the Google Assistant and ignore Alexa because:
- Alexa’s original [sample app](https://github.com/alexa/alexa-avs-sample-app.git) for developers was a relatively heavy
piece of software that relied on a Java backend and a Node.js web service.
- In the meantime Amazon has pulled the plug off that original project.
- The sample app has been replaced by the [Amazon AVS (Alexa Voice Service)](https://github.com/alexa/avs-device-sdk),
which is a C++ service mostly aimed to commercial applications and doesn’t provide a decent quickstart for custom
Python integrations.
- There are [few Python examples for the Alexa SDK](https://developer.amazon.com/en-US/alexa/alexa-skills-kit/alexa-skill-python-tutorial#sample-python-projects),
but they focus on how to develop a skill. I’m not interested in building a skill that runs on Amazon’s servers — I’m
interested in detecting hotwords and raw speech on any device, and the SDK should let me do whatever I want with that.
Additional win: if you have configured the HTTP backend and you have access to the web panel or the dashboard then
you’ll notice that the status of the conversation will also appear on the web page as a modal dialog, where you’ll see
when a hotword has been detected, the recognized speech and the transcript of the assistant response.
That’s all you need to know to customize your assistant — now you can for instance write rules that would blink your
lights when an assistant timer ends, or programmatically play your favourite playlist on mpd/mopidy when you say a
particular phrase, or handle a home made multi-room music setup with Snapcast+platypush through voice commands. As long
as there’s a platypush plugin to do what you want to do, you can do it already.
## Live demo
A [TL;DR video](https://photos.app.goo.gl/mCscTDFcB4SzazeK7) with a practical example:
In this video:
- Using Google Assistant basic features ("how's the weather?") with the "OK Google" hotword (in English)
- Triggering a conversation in Italian when I say the "computer" hotword instead
- Support for custom responses through the Text-to-Speech plugin
- Control the music through custom hooks that leverage mopidy as a backend (and synchronize music with devices in other rooms through the Snapcast plugin)
- Trigger a conversation without hotword - in this case I defined a hook that starts a conversation when something approaches a distance sensor on my Raspberry
- Take pictures from a camera on another Raspberry and preview them on the screen through platypush' camera plugins, and send them to mobile devices through the Pushbullet or AutoRemote plugins
- All the conversations and responses are visually shown on the platypush web dashboard