[//]: # (title: Build your customizable voice assistant with Platypush)
[//]: # (description: Use the available integrations to build a voice assistant with a simple microphone)
[//]: # (image: /img/voice-assistant-1.png)
[//]: # (published: 2019-09-05)

My dream of a piece of software that you could simply talk to and get things done started more than 10 years ago, when I
was still a young M.Sc student who imagined getting common tasks done on my computer through the same kind of natural
interaction you see between Dave and [HAL 9000](https://en.wikipedia.org/wiki/HAL_9000) in
[2001: A Space Odyssey](https://en.wikipedia.org/wiki/2001:_A_Space_Odyssey_(film)). Together with a friend I developed
[Voxifera](https://github.com/BlackLight/Voxifera) way back in 2008. Although the software worked well enough for basic
tasks, as long as it was always me to provide the voice commands and as long as the list of custom voice commands was
below 10 items, Google and Amazon in the latest years have gone way beyond what an M.Sc student alone could do with
fast-Fourier transforms and Markov models.

When years later I started building [Platypush](https://git.platypush.tech/platypush/platypush), I still dreamed of the
same voice interface, leveraging the new technologies, while not being caged by the interactions natively provided by
those commercial assistants. My goal was still to talk to my assistant and get it to do whatever I wanted to, regardless
of the skills/integrations supported by the product, regardless of whichever answer its AI was intended to provide for
that phrase. And, most of all, my goal was to have all the business logic of the actions to run on my own device(s), not
on someone else’s cloud. I feel like by now that goal has been mostly accomplished (assistant technology with 100%
flexibility when it comes to phrase patterns and custom actions), and today I’d like to show you how to set up your own
Google Assistant on steroids as well with a Raspberry Pi, microphone and platypush. I’ll also show how to run your
custom hotword detection models through the [Snowboy](https://snowboy.kitt.ai/) integration, for those who wish greater flexibility when it comes to
how to summon your digital butler besides the boring “Ok Google” formula, or those who aren’t that happy with the idea
of having Google to constantly listen to everything that is said in the room. For those who are unfamiliar with
platypush, I suggest
reading [my previous article](https://blog.platypush.tech/article/Ultimate-self-hosted-automation-with-Platypush) on
what it is, what it can do, why I built it and how to get started with it.

## Context and expectations

First, a bit of context around the current state of the assistant integration (and the state of the available assistant APIs/SDKs in general).

My initial goal was to have a voice assistant that could:

1. Continuously listen through an audio device for a specific audio pattern or phrase and process the subsequent voice
   requests.

2. Support multiple models for the hotword, so that multiple phrases could be used to trigger a request process, and
   optionally one could even associate a different assistant language to each hotword.

3. Support conversation start/end actions even without hotword detection — something like “start listening when I press
   a button or when I get close to a distance sensor”.

4. Provide the possibility to configure a list of custom phrases or patterns (ideally
   through [regular expressions](https://en.wikipedia.org/wiki/Regular_expression)) that, when matched, would run a
   custom pre-configured task or list of tasks on the executing device, or on any device connected through it.

5. If a phrase doesn’t match any of those pre-configured patterns, then the assistant would go on and process the
   request in the default way (e.g. rely on Google’s “how’s the weather?” or “what’s on my calendar?” standard response).

Basically, I needed an assistant SDK or API that could be easily wrapped into a library or tiny module, a module that could listen for hotwords, start/stop conversations programmatically, and return the detected phrase directly back to my business logic if any speech was recognized.

I eventually decided to develop the integration with the Google Assistant and ignore Alexa because:

- Alexa’s original [sample app](https://github.com/alexa/alexa-avs-sample-app.git) for developers was a relatively heavy
  piece of software that relied on a Java backend and a Node.js web service.

- In the meantime Amazon has pulled the plug off that original project.

- The sample app has been replaced by the [Amazon AVS (Alexa Voice Service)](https://github.com/alexa/avs-device-sdk),
  which is a C++ service mostly aimed to commercial applications and doesn’t provide a decent quickstart for custom
  Python integrations.

- There are [few Python examples for the Alexa SDK](https://developer.amazon.com/en-US/alexa/alexa-skills-kit/alexa-skill-python-tutorial#sample-python-projects),
  but they focus on how to develop a skill. I’m not interested in building a skill that runs on Amazon’s servers — I’m
  interested in detecting hotwords and raw speech on any device, and the SDK should let me do whatever I want with that.