- Python 89.6%
- Dockerfile 10.4%
- Install pulseaudio-utils for pactl device selection - Add input/output device and volume config examples for openwakeword/vosk/piper |
||
|---|---|---|
| config | ||
| workdir/sounds | ||
| .gitignore | ||
| Dockerfile | ||
| LICENSE | ||
| README.md | ||
assistant-sample
This is a sample voice assistant based on Platypush.
It provides a full-featured voice assistant with some sample commands, which can be extended to your liking.
Architecture
The assistant leverages the following Platypush plugins:
-
Hotword Detection:
assistant.openwakeword. It starts a conversation when the hotword (e.g. "Alexa") is detected. -
Speech Recognition:
assistant.vosk. It transcribes the audio into text. -
AI Responses:
openai. Used whether to:- Answer generic questions that don't match any preset voice commands
- Translate unstructured transcriptions of voice commands into structured intents to be processed by other plugins.
-
Speech Synthesis:
tts.piper. It converts response text into speech.
Optional plugins
-
Weather:
weather.openweathermap. It fetches the current weather in the configured location. -
Music:
music.mpd. It controls the MPD music player. -
Lighting:
light.hue. It controls Philips Hue lights.
Configuration
Models
Hotword Detection
When the service starts the first time, it will automatically download all the available models.
You can then use the following command to list the available models:
curl -s -XPOST \
-H 'Content-type: application/json' \
-H "Authorization: Bearer $PLATYPUSH_TOKEN" \
-d '{"type":"request", "action":"assistant.openwakeword.list_models"}' \
http://localhost:8008/execute
Where $PLATYPUSH_TOKEN is the token of the user that is running the service.
You can retrieve it by connecting to http://localhost:8008 when the service
starts for the first time. Create your credentials, then select Settings ->
Tokens -> Generate API Token.
Speech-to-text
A full list of the Vosk voice models is available here.
Some feedback about the quality of the English models:
| Model | Size | Notes |
|---|---|---|
vosk-model-small-en-us-0.15 |
40 MB | Very fast and lightweight model that can also run on an old Raspberry Pi, but accuracy can be low. |
vosk-model-en-us-0.22-lgraph |
128 MB | Reasonably accurate on clear speech and with native speakers, but still small enough to run fine even on a Raspberry Pi. |
vosk-model-en-us-0.22 |
1.8 GB | Accurate generic US English model. Fast on an laptop or x86 processor, but it may be a bit heavy on a Raspberry Pi. |
Download the selected model to the Docker volume working directory:
mkdir -p ./workdir/assistant.vosk/models
cd ./workdir/assistant.vosk/models
wget "https://alphacephei.com/vosk/models/vosk-model-en-us-0.22-lgraph.zip"
unzip "vosk-model-en-us-0.22-lgraph.zip"
rm "vosk-model-en-us-0.22-lgraph.zip"
Text-to-speech
Download a speech synthesis model from here.
Audio samples are also available to get an idea of the type of voice before downloading.
The model usually consists of a *.onnx and a *.onnx.json file. Download
both of them to the Docker volume working directory:
mkdir -p ./workdir/piper_tts
cd ./workdir/piper_tts
wget "https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/hfc_female/medium/en_US-hfc_female-medium.onnx"
wget "https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/hfc_female/medium/en_US-hfc_female-medium.onnx.json"
Platypush Configuration
Copy and edit the example configuration file:
cp config/config.example.yaml config/config.yaml
Build
Build the container image for the assistant service:
docker build -t platypush-voice .
Run
The assistant needs access to the host microphone and speakers. The container routes ALSA through PulseAudio, so the examples below connect it to a PulseAudio server running on the host.
Linux
With PulseAudio installed:
docker run --rm \
-e PULSE_SERVER=unix:/run/pulse/native \
-v /run/user/$(id -u)/pulse/native:/run/pulse/native \
--name voice-assistant \
-p 8008:8008 \
-v ./config:/etc/platypush \
-v ./workdir:/var/lib/platypush \
platypush-voice
macOS
Install and start PulseAudio on the host:
brew install pulseaudio
pulseaudio --daemonize=yes --exit-idle-time=-1
pactl load-module module-native-protocol-tcp \
auth-anonymous=1 \
listen=0.0.0.0 \
port=4713
Then start the container:
docker run --rm \
-e PULSE_SERVER=tcp:host.docker.internal:4713 \
--name voice-assistant \
-p 8008:8008 \
-v "$(pwd)/config:/etc/platypush" \
-v "$(pwd)/workdir:/var/lib/platypush" \
platypush-voice
If pactl load-module reports that the module is already loaded, you can keep
using the existing PulseAudio daemon.
Windows
Install PulseAudio for Windows, then create a default.pa file in the same
directory as pulseaudio.exe:
load-module module-waveout sink_name=output source_name=input record=1
load-module module-native-protocol-tcp auth-anonymous=1 listen=0.0.0.0 port=4713
set-default-sink output
set-default-source input
Start PulseAudio from PowerShell:
.\pulseaudio.exe -F .\default.pa --exit-idle-time=-1
Then start the container from the repository directory:
docker run --rm `
-e PULSE_SERVER=tcp:host.docker.internal:4713 `
--name voice-assistant `
-p 8008:8008 `
-v "${PWD}/config:/etc/platypush" `
-v "${PWD}/workdir:/var/lib/platypush" `
platypush-voice
Make sure microphone access is enabled for desktop applications under Windows privacy settings, and allow PulseAudio through the firewall if prompted.
Usage
Once the service is running, you can start interact with it with voice commands (the default activation word is "Alexa").
Any questions about the weather will be resolved by the weather plugin if it's been enabled.
If the music or lights plugins are enabled, they can be controlled with voice commands ("stop the music", "turn on the lights", etc.)
Otherwise, the assistant will use the openai plugin to respond to your
questions, with follow-up turns when the response from OpenAI is also a question.
Extending the Assistant
The assistant logic is modeled through simple Platypush hooks under
config/scripts.
You can extend it as you like by defining your own hooks or modifying the existing ones.
The API is relatively straightforward:
import logging
from platypush import run, when
from platypush.events.assistant import (
ConversationEndEvent,
ConversationStartEvent,
HotwordDetectedEvent,
SpeechRecognizedEvent,
)
logger = logging.getLogger(__name__)
ai_plugin = "openai"
assistant_plugin = "assistant.vosk"
@when(HotwordDetectedEvent)
def on_hotword_detected(event: HotwordDetectedEvent):
"""
Something to do when the hotword is detected.
"""
logger.info(f"Hotword {event.hotword} detected")
run(f"{assistant_plugin}.start_conversation")
@when(ConversationStartEvent)
def on_conversation_start():
"""
Something to do when a conversation starts.
"""
logger.info("Voice assistant conversation started")
@when(SpeechRecognizedEvent, plugin=assistant_plugin)
def on_speech_recognized(event: SpeechRecognizedEvent):
"""
Generic handler for speech recognition events received
by the configured assistant plugin.
"""
logger.info("Recognized speech: %s", event.phrase)
# Forward the request to OpenAI and render the response as speech
response = run(
f"{ai_plugin}.get_response",
prompt=event.phrase,
context=[
{
"role": "system",
"content": (
"You are a voice assistant that can answer questions and perform actions. "
"Keep in mind that prompts are transcriptions of user speech and they may "
"contain misspellings or errors. Try and interpret them as best as possible. "
"When possible, keep your answers short and concise."
),
}
],
)
# If the response is not empty, render it using the TTS plugin
if response:
event.assistant.render_response(response)
@when(SpeechRecognizedEvent, phrase="turn on (the)? lights")
def turn_on_lights():
"""
Hook run when the user says "turn on the lights" (regex)
"""
run("light.hue.on")
@when(SpeechRecognizedEvent, phrase="play (the)? music")
def play_music():
"""
Hook run when the user says "play the music" (regex)
"""
run("music.mpd.play")
@when(SpeechRecognizedEvent, phrase="set the music volume (to|on|at) ${volume}")
def set_volume(volume: int):
"""
Hook run when the user says "set the music volume to ${volume}"
(regex with parameter).
"""
run("music.mpd.set_volume", volume=volume)