Added proper README and examples

2025-07-04 04:38:08 +02:00 · 2020-10-28 18:12:19 +01:00 · 2020-10-28 18:12:19 +01:00 · d867880199
commit d867880199
parent 2f578929fb
16 changed files with 764 additions and 120 deletions
--- a/.gitignore
+++ b/.gitignore
@ -3,3 +3,6 @@
 /data/
 /models/
 __pycache__
+/build
+/dist
+*.egg-info
--- a/LICENSE.txt
+++ b/LICENSE.txt
@ -0,0 +1,22 @@
+MIT License
+
+Copyright (c) 2017, 2020 Fabio Manganiello
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
--- a/README.md
+++ b/README.md
@ -0,0 +1,313 @@
+micmon
+======
+
+*micmon* is a ML-powered library to detect sounds in an audio stream,
+either from a file or from an audio input. The use case for its development
+has been the creation of a self-built baby monitor to detect the cries
+of my new born through a RaspberryPi + USB microphone, but it should be
+good enough to detect any type of noise or audio if used with a well trained
+model.
+
+It works by splitting an audio stream into short segments, it calculates the
+FFT and spectrum bins for each of these segments, and it uses such spectrum
+data to train a model to detect the audio. It works well with sounds that are
+loud enough to stand out of the background (it's good at detecting e.g. the
+sound of an alarm clock, not the sound of flying mosquitto), that are long
+enough compared to the size of the chunks (very short sounds will leave a
+very small trace in the spectrum of an audio chunk) and, even better, if
+their frequency bandwidth doesn't overlap a lot with other sounds (it's good
+at detecting the cries of your baby, since his/her voice has a higher pitch
+than yours, but it may not detect difference in the spectral signature of
+the voice of two adult men in the same age group). It's not going to perform
+very well if instead you are trying to use to detect speech - since it operates
+on time-agnostic frequency data from chunks of audio it's not granular enough
+for proper speech-to-text applications, and it wouldn't be robust enough to
+detect differences in voice pitch, tone or accent.
+
+Dependencies
+------------
+
+The software uses *ffmpeg* to record and decode audio - check instructions for
+your OS on how to get it installed. It also requires *lame* or any other mp3
+encoder to encode captured audio to mp3.
+
+Python dependencies:
+
+```bash
+pip install numpy tensorflow keras
+
+# Optional, for graphs
+pip install matplotlib
+```
+
+Installation
+------------
+
+```bash
+git clone https://github.com/BlackLight/micmon
+cd micmon
+python setup.py install
+```
+
+Audio capture
+-------------
+
+Once the software is installed, you can proceed with recording some audio that
+will be used for training the model. First create a directory for your audio
+samples dataset:
+
+```bash
+# This folder will store our audio samples
+mkdir -p ~/datasets/sound-detect/audio
+
+# This folder will store the datasets
+# generated from the labelled audio samples
+mkdir -p ~/datasets/sound-detect/data
+
+# This folder will store the generated
+# Tensorflow models
+mkdir -p ~/models
+
+cd ~/datasets/sound-detect/audio
+```
+
+Then create a new sub-folder for your first audio sample and start recording.
+Example:
+
+```bash
+mkdir sample_1
+cd sample_1
+arecord -D plughw:0,1 -f cd | lame - audio.mp3
+```
+
+In the example above we are using *arecord* to record from the second channel
+of the first audio device (check a list of available recording devices with
+*arecord -l*) in WAV format, and we are then using the *lame* encoder to
+convert the raw audio to mp3. When done with recording, just Ctrl-C the
+application and your audio file will be ready.
+
+Audio labelling
+---------------
+
+In the same directory as your sample (in the example above it will be
+`~/datasets/sound-detect/audio/sample_1`) create a new file named
+`labels.json`. Now open your audio file in Audacity or any audio player
+and identify the audio segments that match your criteria - for example
+when your baby is crying, when the alarm starts, when your neighbour
+starts drilling the wall, or whatever the criteria is. `labels.json`
+should contain a key-value mapping in the form of `start_time -> label`.
+Example:
+
+```json
+{
+  "00:00": "negative",
+  "02:13": "positive",
+  "04:57": "negative",
+  "15:41": "positive",
+  "18:24": "negative"
+}
+```
+
+In the example above, all the audio segments between 00:00 and 02:12 will
+be labelled as negative, all the segments between 02:13 and 04:56 as
+positive, and so on.
+
+You can now use *micmon* to generate a frequency spectrum dataset out of
+your labelled audio. You can do it either through the `micmon-datagen`
+script or with your own script.
+
+### micmon-datagen
+
+Type `micmon-datagen --help` to get a full list of the available options.
+In general, `micmon-datagen` requires a directory that contains the labelled
+audio samples sub-directories as input and a directory where the calculated
+numpy-compressed datasets will be stored. If you want to generate the dataset
+for the audio samples captured on the previous iteration then the command
+will be something like this:
+
+```bash
+micmon-datagen --low 250 --high 7500 --bins 100 --sample-duration 2 --channels 1 \
+    ~/datasets/sound-detect/audio  ~/models
+```
+
+The `--low` and `--high` options respectively identify the lowest and highest
+frequencies that should be taken into account in the output spectrum. By default
+these values are 20 Hz and 20 kHz (respectively the lowest and highest frequency
+audible to a healthy and young human ear), but you can narrow down the frequency
+space to only detect the frequencies that you're interested in and to remove
+high-frequency harmonics that may spoil your data. A good way to estimate the
+frequency space is to use e.g. Audacity or any audio equalizer to select the
+segments of your audio that contain the sounds that you want to detect and
+check their dominant frequencies - you definitely want those frequencies to be
+included in your range.
+
+`--bins` specifies in how many segments/buckets the frequency spectrum should
+be split - 100 bins is the default value. `--sample-duration` specifies the
+duration in seconds for each spectrum data point - 2 seconds is the default
+value, i.e. the audio samples will be read in chunks of 2 seconds each and the
+spectrum will be calculated for each of these chunks. If the sounds you want to
+detect are shorter then you may want to reduce this value.
+
+### Generate the dataset via script
+
+The other way to generate the dataset from the audio is through the *micmon* API
+itself. This option also enables you to take a peek at the audio data to better
+calibrate the parameters. For example:
+
+```python
+import os
+
+from micmon.audio import AudioDirectory, AudioPlayer, AudioFile
+from micmon.dataset import DatasetWriter
+
+basedir = os.path.expanduser('~/datasets/sound-detect')
+audio_dir = os.path.join(basedir, 'audio/sample_1')
+datasets_dir = os.path.join(basedir, 'data')
+cutoff_frequencies = [250, 7500]
+
+# Scan the base audio_dir for labelled audio samples
+audio_dirs = AudioDirectory.scan(audio_dir)
+
+# Play some audio samples starting from 01:00
+for audio_dir in audio_dirs:
+    with AudioFile(audio_dir, start='01:00', duration=5) as reader, \
+            AudioPlayer() as player:
+        for sample in reader:
+            player.play(sample)
+
+# Plot the audio and spectrum of the audio samples in the first 10 seconds
+# of each audio file.
+for audio_dir in audio_dirs:
+    with AudioFile(audio_dir, start=0, duration=10) as reader:
+        for sample in reader:
+            sample.plot_audio()
+            sample.plot_spectrum(low_freq=cutoff_frequencies[0],
+                                 high_freq=cutoff_frequencies[1])
+
+# Save the spectrum information and labels of the samples to a
+# different compressed file for each audio file.
+for audio_dir in audio_dirs:
+    dataset_file = os.path.join(datasets_dir, os.path.basename(audio_dir.path) + '.npz')
+    print(f'Processing audio sample {audio_dir.path}')
+
+    with AudioFile(audio_dir) as reader, \
+            DatasetWriter(dataset_file,
+                          low_freq=cutoff_frequencies[0],
+                          high_freq=cutoff_frequencies[1]) as writer:
+        for sample in reader:
+            writer += sample
+
+```
+
+Training the model
+------------------
+
+Once you have some `.npz` datasets saved under `~/datasets/sound-detect/data`, you can
+use those datasets to train a Tensorflow+Keras model to classify an audio segment. A full
+example is available under `examples/train.py`:
+
+```python
+import os
+from keras import layers
+
+from micmon.dataset import Dataset
+from micmon.model import Model
+
+# This is a directory that contains the saved .npz dataset files
+datasets_dir = os.path.expanduser('~/datasets/sound-detect/data')
+
+# This is the output directory where the model will be saved
+model_dir = os.path.expanduser('~/models/sound-detect')
+
+# This is the number of training epochs for each dataset sample
+epochs = 2
+
+# Load the datasets from the compressed files.
+# 70% of the data points will be included in the training set,
+# 30% of the data points will be included in the evaluation set
+# and used to evaluate the performance of the model.
+datasets = Dataset.scan(datasets_dir, validation_split=0.3)
+labels = ['negative', 'positive']
+freq_bins = len(datasets[0].samples[0])
+
+# Create a network with 4 layers (one input layer, two intermediate layers and one output layer).
+# The first intermediate layer in this example will have twice the number of units as the number
+# of input units, while the second intermediate layer will have 75% of the number of
+# input units. We also specify the names for the labels and the low and high frequency range
+# used when sampling.
+model = Model(
+    [
+        layers.Input(shape=(freq_bins,)),
+        layers.Dense(int(2 * freq_bins), activation='relu'),
+        layers.Dense(int(0.75 * freq_bins), activation='relu'),
+        layers.Dense(len(labels), activation='softmax'),
+    ],
+    labels=labels,
+    low_freq=datasets[0].low_freq,
+    high_freq=datasets[0].high_freq
+)
+
+# Train the model
+for epoch in range(epochs):
+    for i, dataset in enumerate(datasets):
+        print(f'[epoch {epoch+1}/{epochs}] [audio sample {i+1}/{len(datasets)}]')
+        model.fit(dataset)
+        evaluation = model.evaluate(dataset)
+        print(f'Validation set loss and accuracy: {evaluation}')
+
+# Save the model
+model.save(model_dir, overwrite=True)
+```
+
+At the end of the process you should find your Tensorflow model saved under `~/models/sound-detect`.
+You can use it in your scripts to classify audio samples from audio sources.
+
+Classifying audio samples
+-------------------------
+
+One use case is to analyze an audio file and use the model to detect specific sounds. Example:
+
+```python
+import os
+
+from micmon.audio import AudioFile
+from micmon.model import Model
+
+model_dir = os.path.expanduser('~/models/sound-detect')
+model = Model.load(model_dir)
+cur_seconds = 60
+sample_duration = 2
+
+with AudioFile('/path/to/some/audio.mp3',
+               start=cur_seconds, duration='10:00',
+               sample_duration=sample_duration) as reader:
+    for sample in reader:
+        prediction = model.predict(sample)
+        print(f'Audio segment at {cur_seconds} seconds: {prediction}')
+        cur_seconds += sample_duration
+```
+
+Another is to analyze live audio samples imported from an audio device - e.g. a USB microphone.
+Example:
+
+```python
+import os
+
+from micmon.audio import AudioDevice
+from micmon.model import Model
+
+model_dir = os.path.expanduser('~/models/sound-detect')
+model = Model.load(model_dir)
+audio_system = 'alsa'        # Supported: alsa and pulse
+audio_device = 'plughw:1,0'  # Get list of recognized input devices with arecord -l
+
+with AudioDevice(audio_system, device=audio_device) as source:
+    for sample in source:
+        source.pause()  # Pause recording while we process the frame
+        prediction = model.predict(sample)
+        print(prediction)
+        source.resume() # Resume recording
+```
+
+You can use these two examples as blueprints to set up your own automation routines
+with sound detection.
--- a/examples/predict_from_audio_file.py
+++ b/examples/predict_from_audio_file.py
@ -0,0 +1,16 @@
+import os
+
+from micmon.audio import AudioFile
+from micmon.model import Model
+
+model_dir = os.path.expanduser('~/models/sound-detect')
+model = Model.load(model_dir)
+cur_seconds = 60
+sample_duration = 2
+
+with AudioFile('/path/to/some/audio.mp3', start=cur_seconds, duration='10:00',
+               sample_duration=sample_duration) as reader:
+    for sample in reader:
+        prediction = model.predict(sample)
+        print(f'Audio segment at {cur_seconds} seconds: {prediction}')
+        cur_seconds += sample_duration
--- a/examples/predict_from_microphone.py
+++ b/examples/predict_from_microphone.py
@ -0,0 +1,18 @@
+import os
+
+from micmon.audio import AudioDevice
+from micmon.model import Model
+
+# Path to a previously saved sound detection Tensorflow model
+model_dir = os.path.expanduser('~/models/sound-detect')
+model = Model.load(model_dir)
+
+audio_system = 'alsa'        # Supported: alsa and pulse
+audio_device = 'plughw:1,0'  # Get list of recognized input devices with arecord -l
+
+with AudioDevice(audio_system, device=audio_device) as source:
+    for sample in source:
+        source.pause()  # Pause recording while we process the frame
+        prediction = model.predict(sample)
+        print(prediction)
+        source.resume() # Resume recording
--- a/examples/train.py
+++ b/examples/train.py
@ -0,0 +1,54 @@
+# This script shows how to train a neural network to detect sounds given a training set of collected frequency
+# spectrum data.
+
+import os
+from keras import layers
+
+from micmon.dataset import Dataset
+from micmon.model import Model
+
+# This is a directory that contains the saved .npz dataset files
+datasets_dir = os.path.expanduser('~/datasets/baby-monitor/datasets')
+
+# This is the output directory where the model will be saved
+model_dir = os.path.expanduser(os.path.join('~', 'models', 'baby-monitor'))
+
+# This is the number of training epochs for each dataset sample
+epochs = 2
+
+# This value establishes the share of the dataset to be used for cross-validation
+validation_split = 0.3
+
+# Load the datasets from the compressed files
+datasets = Dataset.scan(datasets_dir, validation_split=0.3)
+
+# Get the number of frequency bins
+freq_bins = len(datasets[0].samples[0])
+
+# Create a network with 4 layers (one input layer, two intermediate layers and one output layer).
+# The first intermediate layer in this example will have twice the number of units as the number
+# of input units, while the second intermediate layer will have as many units as the number of
+# input units. We also specify the names for the labels and the low and high frequency range
+# used when sampling.
+model = Model(
+    [
+        layers.Input(shape=(freq_bins,)),
+        layers.Dense(int(2.0 * freq_bins), activation='relu'),
+        layers.Dense(int(freq_bins), activation='relu'),
+        layers.Dense(len(datasets[0].labels), activation='softmax'),
+    ],
+    labels=['negative', 'positive'],
+    low_freq=datasets[0].low_freq,
+    high_freq=datasets[0].high_freq,
+)
+
+# Train the model
+for epoch in range(epochs):
+    for i, dataset in enumerate(datasets):
+        print(f'[epoch {epoch+1}/{epochs}] [audio sample {i+1}/{len(datasets)}]')
+        model.fit(dataset)
+        evaluation = model.evaluate(dataset)
+        print(f'Validation set loss and accuracy: {evaluation}')
+
+# Save the model
+model.save(model_dir, overwrite=True)
--- a/micmon/init.py
+++ b/micmon/init.py
@ -1,4 +1,4 @@
 import logging
 import sys

-logging.basicConfig(level=logging.DEBUG, stream=sys.stdout)
+logging.basicConfig(level=logging.INFO, stream=sys.stdout)
--- a/micmon/audio/file.py
+++ b/micmon/audio/file.py
@ -1,24 +1,40 @@
 import json
+import os
+import pathlib
 from typing import Optional, List, Tuple, Union

-from micmon.audio import AudioDirectory, AudioSegment, AudioSource
+from micmon.audio import AudioSegment, AudioSource, AudioDirectory


 class AudioFile(AudioSource):
-    def __init__(self, path: AudioDirectory,
+    def __init__(self,
+                 audio_file: Union[str, AudioDirectory],
+                 labels_file: Optional[str] = None,
                 start: Union[str, int, float] = 0,
                 duration: Optional[Union[str, int, float]] = None,
                 *args, **kwargs):
        super().__init__(*args, **kwargs)
+        if isinstance(audio_file, AudioDirectory):
+            audio_file = audio_file.audio_file
+            labels_file = audio_file.labels_file
+
+        self.audio_file = os.path.abspath(os.path.expanduser(audio_file))
+
+        if not labels_file:
+            labels_file = os.path.join(pathlib.Path(self.audio_file).parent, 'labels.json')
+            if not os.path.isfile(labels_file):
+                labels_file = None
+
+        self.labels_file = os.path.abspath(os.path.expanduser(labels_file)) if labels_file else None
        self.ffmpeg_args = (
-            self.ffmpeg_bin, '-i', path.audio_file, *(('-ss', str(start)) if start else ()),
+            self.ffmpeg_bin, '-i', audio_file, *(('-ss', str(start)) if start else ()),
            *(('-t', str(duration)) if duration else ()), *self.ffmpeg_base_args
        )

        self.start = self.convert_time(start)/1000
        self.duration = self.convert_time(duration)/1000
-        self.segments = self.parse_labels_file(path.labels_file) \
-            if path.labels_file else []
+        self.segments = self.parse_labels_file(labels_file) \
+            if labels_file else []

        self.labels = sorted(list(set(label for timestamp, label in self.segments)))
        self.cur_time = self.start
@ -53,4 +69,3 @@ class AudioFile(AudioSource):
            return audio

        raise StopIteration
-
--- a/micmon/model/init.py
+++ b/micmon/model/init.py
@ -2,7 +2,7 @@ import json
 import os
 import numpy as np

-from typing import List, Optional, Union, Tuple
+from typing import List, Optional, Union
 from keras import Sequential, losses, optimizers, metrics
 from keras.layers import Layer
 from keras.models import load_model, Model as _Model
@ -20,10 +20,11 @@ class Model:
                 model: Optional[_Model] = None, optimizer: Union[str, optimizers.Optimizer] = 'adam',
                 loss: Union[str, losses.Loss] = losses.SparseCategoricalCrossentropy(from_logits=True),
                 metrics: List[Union[str, metrics.Metric]] = ('accuracy',),
-                 cutoff_frequencies: Tuple[int, int] = (AudioSegment.default_low_freq, AudioSegment.default_high_freq)):
+                 low_freq: int = AudioSegment.default_low_freq,
+                 high_freq: int = AudioSegment.default_high_freq):
        assert layers or model
        self.label_names = labels
-        self.cutoff_frequencies = list(map(int, cutoff_frequencies))
+        self.cutoff_frequencies = (int(low_freq), int(high_freq))

        if layers:
            self._model = Sequential(layers)
@ -74,4 +75,4 @@ class Model:
            with open(freq_file, 'r') as f:
                frequencies = json.load(f)

-        return cls(model=model, labels=label_names, cutoff_frequencies=frequencies)
+        return cls(model=model, labels=label_names, low_freq=frequencies[0], high_freq=frequencies[1])
--- a/micmon/utils/init.py
+++ b/micmon/utils/init.py
--- a/micmon/utils/datagen.py
+++ b/micmon/utils/datagen.py
@ -0,0 +1,117 @@
+import argparse
+import logging
+import os
+import sys
+
+from micmon.audio import AudioDirectory, AudioFile, AudioSegment
+from micmon.dataset import DatasetWriter
+
+logger = logging.getLogger(__name__)
+defaults = {
+    'sample_duration': 2.0,
+    'sample_rate': 44100,
+    'channels': 1,
+    'ffmpeg_bin': 'ffmpeg',
+}
+
+
+def create_dataset(audio_dir: str, dataset_dir: str,
+                   low_freq: int = AudioSegment.default_low_freq,
+                   high_freq: int = AudioSegment.default_high_freq,
+                   bins: int = AudioSegment.default_bins,
+                   sample_duration: float = defaults['sample_duration'],
+                   sample_rate: int = defaults['sample_rate'],
+                   channels: int = defaults['channels'],
+                   ffmpeg_bin: str = defaults['ffmpeg_bin']):
+    audio_dir = os.path.abspath(os.path.expanduser(audio_dir))
+    dataset_dir = os.path.abspath(os.path.expanduser(dataset_dir))
+    audio_dirs = AudioDirectory.scan(audio_dir)
+
+    for audio_dir in audio_dirs:
+        dataset_file = os.path.join(dataset_dir, os.path.basename(audio_dir.path) + '.npz')
+        logger.info(f'Processing audio sample {audio_dir.path}')
+
+        with AudioFile(audio_dir.audio_file, audio_dir.labels_file,
+                       sample_duration=sample_duration, sample_rate=sample_rate, channels=channels,
+                       ffmpeg_bin=os.path.expanduser(ffmpeg_bin)) as reader, \
+                DatasetWriter(dataset_file, low_freq=low_freq, high_freq=high_freq, bins=bins) as writer:
+            for sample in reader:
+                writer += sample
+
+
+def main():
+    # noinspection PyTypeChecker
+    parser = argparse.ArgumentParser(
+        description='''
+Tool to create numpy dataset files with audio spectrum data from a set of labelled raw audio files.''',
+
+        epilog='''
+- audio_dir should contain a list of sub-directories, each of which represents a labelled audio sample.
+  audio_dir should have the following structure:
+
+  audio_dir/
+    -> train_sample_1
+      -> audio.mp3
+      -> labels.json
+    -> train_sample_2
+      -> audio.mp3
+      -> labels.json
+  ...
+
+- labels.json is a key-value JSON file that contains the labels for each audio segment. Example:
+
+   {
+     "00:00": "negative",
+     "02:13": "positive",
+     "04:57": "negative",
+     "15:41": "positive",
+     "18:24": "negative"
+   }
+
+  Each entry indicates that all the audio samples between the specified timestamp and the next entry or
+  the end of the audio file should be applied the specified label.
+
+- dataset_dir is the directory where the generated labelled spectrum dataset in .npz format will be saved.
+  Each dataset file will be named like its associated audio samples directory.''',
+
+        formatter_class=argparse.RawDescriptionHelpFormatter
+    )
+
+    parser.add_argument('audio_dir', help='Directory containing the raw audio samples directories to be scanned.')
+    parser.add_argument('dataset_dir', help='Destination directory for the compressed .npz files containing the '
+                                            'frequency spectrum datasets.')
+    parser.add_argument('--low', help='Specify the lowest frequency to be considered in the generated frequency '
+                                      'spectrum. Default: 20 Hz (lowest possible frequency audible to a human ear).',
+                        required=False, default=AudioSegment.default_low_freq, dest='low_freq', type=int)
+
+    parser.add_argument('--high', help='Specify the highest frequency to be considered in the generated frequency '
+                                       'spectrum. Default: 20 kHz (highest possible frequency audible to a human ear).',
+                        required=False, default=AudioSegment.default_high_freq, dest='high_freq', type=int)
+
+    parser.add_argument('-b', '--bins', help=f'Specify the number of frequency bins to be used for the spectrum '
+                                             f'analysis (default: {AudioSegment.default_bins})',
+                        required=False, default=AudioSegment.default_bins, dest='bins', type=int)
+
+    parser.add_argument('-d', '--sample-duration', help=f'The script will calculate the spectrum of audio segments of '
+                                                        f'this specified length in seconds (default: '
+                                                        f'{defaults["sample_duration"]}).',
+                        required=False, default=defaults['sample_duration'], dest='sample_duration', type=float)
+
+    parser.add_argument('-r', '--sample-rate', help=f'Audio sample rate (default: {defaults["sample_rate"]} Hz)',
+                        required=False, default=defaults['sample_rate'], dest='sample_rate', type=int)
+
+    parser.add_argument('-c', '--channels', help=f'Number of destination audio channels (default: '
+                                                 f'{defaults["channels"]})',
+                        required=False, default=defaults['channels'], dest='channels', type=int)
+
+    parser.add_argument('--ffmpeg', help=f'Absolute path to the ffmpeg executable (default: {defaults["ffmpeg_bin"]})',
+                        required=False, default=defaults['ffmpeg_bin'], dest='ffmpeg_bin', type=str)
+
+    opts, args = parser.parse_known_args(sys.argv[1:])
+    return create_dataset(audio_dir=opts.audio_dir, dataset_dir=opts.dataset_dir, low_freq=opts.low_freq,
+                          high_freq=opts.high_freq, bins=opts.bins, sample_duration=opts.sample_duration,
+                          sample_rate=opts.sample_rate, channels=opts.channels, ffmpeg_bin=opts.ffmpeg_bin)
+
+
+if __name__ == '__main__':
+    main()
--- a/notebooks/dataset.ipynb
+++ b/notebooks/dataset.ipynb
--- a/notebooks/predict.ipynb
+++ b/notebooks/predict.ipynb
@ -46,9 +46,7 @@
    "from micmon.audio import AudioDevice, AudioPlayer\n",
    "from micmon.model import Model\n",
    "\n",
-    "basedir = os.path.expanduser(os.path.join('~', 'projects', 'baby-monitor'))\n",
-    "models_dir = os.path.join(basedir, 'models')\n",
-    "model_path = os.path.join(models_dir, 'baby-monitor')\n",
+    "model_dir = os.path.expanduser(os.path.join('~', 'models', 'baby-monitor'))\n",
    "audio_system = 'alsa'\n",
    "audio_device = 'plughw:3,0'\n",
    "label_names = ['negative', 'positive']"
@ -68,7 +66,7 @@
   "execution_count": 2,
   "outputs": [],
   "source": [
-    "model = Model.load(model_path)"
+    "model = Model.load(model_dir)"
   ],
   "metadata": {
    "collapsed": false,
@ -122,23 +120,17 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
+      "negative\n",
+      "negative\n",
+      "negative\n",
+      "negative\n",
+      "negative\n",
+      "negative\n",
      "negative\n",
      "negative\n",
      "negative\n",
      "negative\n"
     ]
-    },
-    {
-     "ename": "KeyboardInterrupt",
-     "evalue": "",
-     "output_type": "error",
-     "traceback": [
-      "\u001B[0;31m---------------------------------------------------------------------------\u001B[0m",
-      "\u001B[0;31mKeyboardInterrupt\u001B[0m                         Traceback (most recent call last)",
-      "\u001B[0;32m<ipython-input-3-27c83a302cb8>\u001B[0m in \u001B[0;36m<module>\u001B[0;34m\u001B[0m\n\u001B[1;32m      1\u001B[0m \u001B[0;32mwith\u001B[0m \u001B[0mAudioDevice\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0maudio_system\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mdevice\u001B[0m\u001B[0;34m=\u001B[0m\u001B[0maudio_device\u001B[0m\u001B[0;34m)\u001B[0m \u001B[0;32mas\u001B[0m \u001B[0msource\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m----> 2\u001B[0;31m     \u001B[0;32mfor\u001B[0m \u001B[0msample\u001B[0m \u001B[0;32min\u001B[0m \u001B[0msource\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m      3\u001B[0m         \u001B[0msource\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mpause\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m      4\u001B[0m         \u001B[0mprediction\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0mmodel\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mpredict\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0msample\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m      5\u001B[0m         \u001B[0mprint\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mprediction\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n",
-      "\u001B[0;32m~/projects/baby-monitor/micmon/audio/source.py\u001B[0m in \u001B[0;36m__next__\u001B[0;34m(self)\u001B[0m\n\u001B[1;32m     36\u001B[0m             \u001B[0;32mraise\u001B[0m \u001B[0mStopIteration\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m     37\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m---> 38\u001B[0;31m         \u001B[0mdata\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mffmpeg\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mstdout\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mread\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mbufsize\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m     39\u001B[0m         \u001B[0;32mif\u001B[0m \u001B[0mdata\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m     40\u001B[0m             \u001B[0;32mreturn\u001B[0m \u001B[0mAudioSegment\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mdata\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0msample_rate\u001B[0m\u001B[0;34m=\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0msample_rate\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mchannels\u001B[0m\u001B[0;34m=\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mchannels\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n",
-      "\u001B[0;31mKeyboardInterrupt\u001B[0m: "
-     ]
    }
   ],
   "source": [
@ -147,7 +139,7 @@
    "        source.pause()\n",
    "        prediction = model.predict(sample)\n",
    "        print(prediction)\n",
-    "        source.resume()"
+    "        source.resume()\n"
   ],
   "metadata": {
    "collapsed": false,
--- a/notebooks/train.ipynb
+++ b/notebooks/train.ipynb
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1,3 @@
+numpy
+tensorflow
+keras
--- a/setup.py
+++ b/setup.py
@ -0,0 +1,40 @@
+#!/usr/bin/env python
+import os
+
+from setuptools import setup, find_packages
+
+
+def path(fname=''):
+    return os.path.abspath(os.path.join(os.path.dirname(__file__), fname))
+
+
+def readfile(fname):
+    with open(path(fname)) as f:
+        return f.read()
+
+
+setup(
+    name="micmon",
+    version="0.1",
+    author="Fabio Manganiello",
+    author_email="info@fabiomanganiello.com",
+    description="Programmable Tensorflow-based sound/noise detector",
+    license="MIT",
+    python_requires='>= 3.6',
+    keywords="machine-learning tensorflow sound-detection",
+    url="https://github.com/BlackLight/micmon",
+    packages=find_packages(),
+    include_package_data=True,
+    long_description=readfile('README.md'),
+    long_description_content_type='text/markdown',
+    entry_points={
+        'console_scripts': [
+            'micmon-datagen=micmon.utils.datagen:main',
+        ],
+    },
+    classifiers=[
+        "Topic :: Utilities",
+        "License :: OSI Approved :: MIT License",
+        "Development Status :: 3 - Alpha",
+    ],
+)