A sample voice assistant based on Platypush
  • Python 89.6%
  • Dockerfile 10.4%
Find a file
Fabio Manganiello fa54f44f1d
chore: Add PulseAudio utils and document audio device pinning
- Install pulseaudio-utils for pactl device selection
- Add input/output device and volume config examples for openwakeword/vosk/piper
2026-06-22 22:07:36 +02:00
config chore: Add PulseAudio utils and document audio device pinning 2026-06-22 22:07:36 +02:00
workdir/sounds feat: Add Dockerized Platypush voice assistant sample 2026-06-22 15:11:03 +02:00
.gitignore feat: Add Dockerized Platypush voice assistant sample 2026-06-22 15:11:03 +02:00
Dockerfile chore: Add PulseAudio utils and document audio device pinning 2026-06-22 22:07:36 +02:00
LICENSE Initial commit 2026-06-22 02:49:45 +02:00
README.md feat: Add Dockerized Platypush voice assistant sample 2026-06-22 15:11:03 +02:00

assistant-sample

TOC

This is a sample voice assistant based on Platypush.

It provides a full-featured voice assistant with some sample commands, which can be extended to your liking.

Architecture

The assistant leverages the following Platypush plugins:

  • Hotword Detection: assistant.openwakeword. It starts a conversation when the hotword (e.g. "Alexa") is detected.

  • Speech Recognition: assistant.vosk. It transcribes the audio into text.

  • AI Responses: openai. Used whether to:

    • Answer generic questions that don't match any preset voice commands
    • Translate unstructured transcriptions of voice commands into structured intents to be processed by other plugins.
  • Speech Synthesis: tts.piper. It converts response text into speech.

Optional plugins

Configuration

Models

Hotword Detection

When the service starts the first time, it will automatically download all the available models.

You can then use the following command to list the available models:

curl -s -XPOST \
     -H 'Content-type: application/json' \
     -H "Authorization: Bearer $PLATYPUSH_TOKEN" \
     -d '{"type":"request", "action":"assistant.openwakeword.list_models"}' \
     http://localhost:8008/execute

Where $PLATYPUSH_TOKEN is the token of the user that is running the service.

You can retrieve it by connecting to http://localhost:8008 when the service starts for the first time. Create your credentials, then select Settings -> Tokens -> Generate API Token.

Speech-to-text

A full list of the Vosk voice models is available here.

Some feedback about the quality of the English models:

Model Size Notes
vosk-model-small-en-us-0.15 40 MB Very fast and lightweight model that can also run on an old Raspberry Pi, but accuracy can be low.
vosk-model-en-us-0.22-lgraph 128 MB Reasonably accurate on clear speech and with native speakers, but still small enough to run fine even on a Raspberry Pi.
vosk-model-en-us-0.22 1.8 GB Accurate generic US English model. Fast on an laptop or x86 processor, but it may be a bit heavy on a Raspberry Pi.

Download the selected model to the Docker volume working directory:

mkdir -p ./workdir/assistant.vosk/models
cd ./workdir/assistant.vosk/models
wget "https://alphacephei.com/vosk/models/vosk-model-en-us-0.22-lgraph.zip"
unzip "vosk-model-en-us-0.22-lgraph.zip"
rm "vosk-model-en-us-0.22-lgraph.zip"

Text-to-speech

Download a speech synthesis model from here.

Audio samples are also available to get an idea of the type of voice before downloading.

The model usually consists of a *.onnx and a *.onnx.json file. Download both of them to the Docker volume working directory:

mkdir -p ./workdir/piper_tts
cd ./workdir/piper_tts
wget "https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/hfc_female/medium/en_US-hfc_female-medium.onnx"
wget "https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/hfc_female/medium/en_US-hfc_female-medium.onnx.json"

Platypush Configuration

Copy and edit the example configuration file:

cp config/config.example.yaml config/config.yaml

Build

Build the container image for the assistant service:

docker build -t platypush-voice .

Run

The assistant needs access to the host microphone and speakers. The container routes ALSA through PulseAudio, so the examples below connect it to a PulseAudio server running on the host.

Linux

With PulseAudio installed:

docker run --rm \
  -e PULSE_SERVER=unix:/run/pulse/native \
  -v /run/user/$(id -u)/pulse/native:/run/pulse/native \
  --name voice-assistant \
  -p 8008:8008 \
  -v ./config:/etc/platypush \
  -v ./workdir:/var/lib/platypush \
  platypush-voice

macOS

Install and start PulseAudio on the host:

brew install pulseaudio
pulseaudio --daemonize=yes --exit-idle-time=-1
pactl load-module module-native-protocol-tcp \
  auth-anonymous=1 \
  listen=0.0.0.0 \
  port=4713

Then start the container:

docker run --rm \
  -e PULSE_SERVER=tcp:host.docker.internal:4713 \
  --name voice-assistant \
  -p 8008:8008 \
  -v "$(pwd)/config:/etc/platypush" \
  -v "$(pwd)/workdir:/var/lib/platypush" \
  platypush-voice

If pactl load-module reports that the module is already loaded, you can keep using the existing PulseAudio daemon.

Windows

Install PulseAudio for Windows, then create a default.pa file in the same directory as pulseaudio.exe:

load-module module-waveout sink_name=output source_name=input record=1
load-module module-native-protocol-tcp auth-anonymous=1 listen=0.0.0.0 port=4713
set-default-sink output
set-default-source input

Start PulseAudio from PowerShell:

.\pulseaudio.exe -F .\default.pa --exit-idle-time=-1

Then start the container from the repository directory:

docker run --rm `
  -e PULSE_SERVER=tcp:host.docker.internal:4713 `
  --name voice-assistant `
  -p 8008:8008 `
  -v "${PWD}/config:/etc/platypush" `
  -v "${PWD}/workdir:/var/lib/platypush" `
  platypush-voice

Make sure microphone access is enabled for desktop applications under Windows privacy settings, and allow PulseAudio through the firewall if prompted.

Usage

Once the service is running, you can start interact with it with voice commands (the default activation word is "Alexa").

Any questions about the weather will be resolved by the weather plugin if it's been enabled.

If the music or lights plugins are enabled, they can be controlled with voice commands ("stop the music", "turn on the lights", etc.)

Otherwise, the assistant will use the openai plugin to respond to your questions, with follow-up turns when the response from OpenAI is also a question.

Extending the Assistant

The assistant logic is modeled through simple Platypush hooks under config/scripts.

You can extend it as you like by defining your own hooks or modifying the existing ones.

The API is relatively straightforward:

import logging

from platypush import run, when
from platypush.events.assistant import (
    ConversationEndEvent,
    ConversationStartEvent,
    HotwordDetectedEvent,
    SpeechRecognizedEvent,
)

logger = logging.getLogger(__name__)
ai_plugin = "openai"
assistant_plugin = "assistant.vosk"


@when(HotwordDetectedEvent)
def on_hotword_detected(event: HotwordDetectedEvent):
    """
    Something to do when the hotword is detected.
    """
    logger.info(f"Hotword {event.hotword} detected")
    run(f"{assistant_plugin}.start_conversation")


@when(ConversationStartEvent)
def on_conversation_start():
    """
    Something to do when a conversation starts.
    """
    logger.info("Voice assistant conversation started")


@when(SpeechRecognizedEvent, plugin=assistant_plugin)
def on_speech_recognized(event: SpeechRecognizedEvent):
    """
    Generic handler for speech recognition events received
    by the configured assistant plugin.
    """
    logger.info("Recognized speech: %s", event.phrase)

    # Forward the request to OpenAI and render the response as speech
    response = run(
        f"{ai_plugin}.get_response",
        prompt=event.phrase,
        context=[
            {
                "role": "system",
                "content": (
                    "You are a voice assistant that can answer questions and perform actions. "
                    "Keep in mind that prompts are transcriptions of user speech and they may "
                    "contain misspellings or errors. Try and interpret them as best as possible. "
                    "When possible, keep your answers short and concise."
                ),
            }
        ],
    )

    # If the response is not empty, render it using the TTS plugin
    if response:
        event.assistant.render_response(response)


@when(SpeechRecognizedEvent, phrase="turn on (the)? lights")
def turn_on_lights():
    """
    Hook run when the user says "turn on the lights" (regex)
    """
    run("light.hue.on")


@when(SpeechRecognizedEvent, phrase="play (the)? music")
def play_music():
    """
    Hook run when the user says "play the music" (regex)
    """
    run("music.mpd.play")


@when(SpeechRecognizedEvent, phrase="set the music volume (to|on|at) ${volume}")
def set_volume(volume: int):
    """
    Hook run when the user says "set the music volume to ${volume}"
    (regex with parameter).
    """
    run("music.mpd.set_volume", volume=volume)