Added proper README and examples

2025-07-11 07:38:08 +02:00 · 2020-10-28 18:12:19 +01:00 · 2020-10-28 18:12:19 +01:00 · d867880199
commit d867880199
parent 2f578929fb
16 changed files with 764 additions and 120 deletions
--- a/.gitignore
+++ b/.gitignore
@ -3,3 +3,6 @@
 /data/
 /models/
 __pycache__
 /build
 /dist
 *.egg-info
--- a/LICENSE.txt
+++ b/LICENSE.txt
@ -0,0 +1,22 @@
 MIT License
 Copyright (c) 2017, 2020 Fabio Manganiello
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is
 furnished to do so, subject to the following conditions:
 The above copyright notice and this permission notice shall be included in all
 copies or substantial portions of the Software.
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 SOFTWARE.
--- a/README.md
+++ b/README.md
@ -0,0 +1,313 @@
 micmon
 ======
 *micmon* is a ML-powered library to detect sounds in an audio stream,
 either from a file or from an audio input. The use case for its development
 has been the creation of a self-built baby monitor to detect the cries
 of my new born through a RaspberryPi + USB microphone, but it should be
 good enough to detect any type of noise or audio if used with a well trained
 model.
 It works by splitting an audio stream into short segments, it calculates the
 FFT and spectrum bins for each of these segments, and it uses such spectrum
 data to train a model to detect the audio. It works well with sounds that are
 loud enough to stand out of the background (it's good at detecting e.g. the
 sound of an alarm clock, not the sound of flying mosquitto), that are long
 enough compared to the size of the chunks (very short sounds will leave a
 very small trace in the spectrum of an audio chunk) and, even better, if
 their frequency bandwidth doesn't overlap a lot with other sounds (it's good
 at detecting the cries of your baby, since his/her voice has a higher pitch
 than yours, but it may not detect difference in the spectral signature of
 the voice of two adult men in the same age group). It's not going to perform
 very well if instead you are trying to use to detect speech - since it operates
 on time-agnostic frequency data from chunks of audio it's not granular enough
 for proper speech-to-text applications, and it wouldn't be robust enough to
 detect differences in voice pitch, tone or accent.
 Dependencies
 ------------
 The software uses *ffmpeg* to record and decode audio - check instructions for
 your OS on how to get it installed. It also requires *lame* or any other mp3
 encoder to encode captured audio to mp3.
 Python dependencies:
 ```bash
 pip install numpy tensorflow keras
 # Optional, for graphs
 pip install matplotlib
 ```
 Installation
 ------------
 ```bash
 git clone https://github.com/BlackLight/micmon
 cd micmon
 python setup.py install
 ```
 Audio capture
 -------------
 Once the software is installed, you can proceed with recording some audio that
 will be used for training the model. First create a directory for your audio
 samples dataset:
 ```bash
 # This folder will store our audio samples
 mkdir -p ~/datasets/sound-detect/audio
 # This folder will store the datasets
 # generated from the labelled audio samples
 mkdir -p ~/datasets/sound-detect/data
 # This folder will store the generated
 # Tensorflow models
 mkdir -p ~/models
 cd ~/datasets/sound-detect/audio
 ```
 Then create a new sub-folder for your first audio sample and start recording.
 Example:
 ```bash
 mkdir sample_1
 cd sample_1
 arecord -D plughw:0,1 -f cd | lame - audio.mp3
 ```
 In the example above we are using *arecord* to record from the second channel
 of the first audio device (check a list of available recording devices with
 *arecord -l*) in WAV format, and we are then using the *lame* encoder to
 convert the raw audio to mp3. When done with recording, just Ctrl-C the
 application and your audio file will be ready.
 Audio labelling
 ---------------
 In the same directory as your sample (in the example above it will be
 `~/datasets/sound-detect/audio/sample_1`) create a new file named
 `labels.json`. Now open your audio file in Audacity or any audio player
 and identify the audio segments that match your criteria - for example
 when your baby is crying, when the alarm starts, when your neighbour
 starts drilling the wall, or whatever the criteria is. `labels.json`
 should contain a key-value mapping in the form of `start_time -> label`.
 Example:
 ```json
 {
  "00:00": "negative",
  "02:13": "positive",
  "04:57": "negative",
  "15:41": "positive",
  "18:24": "negative"
 }
 ```
 In the example above, all the audio segments between 00:00 and 02:12 will
 be labelled as negative, all the segments between 02:13 and 04:56 as
 positive, and so on.
 You can now use *micmon* to generate a frequency spectrum dataset out of
 your labelled audio. You can do it either through the `micmon-datagen`
 script or with your own script.
 ### micmon-datagen
 Type `micmon-datagen --help` to get a full list of the available options.
 In general, `micmon-datagen` requires a directory that contains the labelled
 audio samples sub-directories as input and a directory where the calculated
 numpy-compressed datasets will be stored. If you want to generate the dataset
 for the audio samples captured on the previous iteration then the command
 will be something like this:
 ```bash
 micmon-datagen --low 250 --high 7500 --bins 100 --sample-duration 2 --channels 1 \
    ~/datasets/sound-detect/audio  ~/models
 ```
 The `--low` and `--high` options respectively identify the lowest and highest
 frequencies that should be taken into account in the output spectrum. By default
 these values are 20 Hz and 20 kHz (respectively the lowest and highest frequency
 audible to a healthy and young human ear), but you can narrow down the frequency
 space to only detect the frequencies that you're interested in and to remove
 high-frequency harmonics that may spoil your data. A good way to estimate the
 frequency space is to use e.g. Audacity or any audio equalizer to select the
 segments of your audio that contain the sounds that you want to detect and
 check their dominant frequencies - you definitely want those frequencies to be
 included in your range.
 `--bins` specifies in how many segments/buckets the frequency spectrum should
 be split - 100 bins is the default value. `--sample-duration` specifies the
 duration in seconds for each spectrum data point - 2 seconds is the default
 value, i.e. the audio samples will be read in chunks of 2 seconds each and the
 spectrum will be calculated for each of these chunks. If the sounds you want to
 detect are shorter then you may want to reduce this value.
 ### Generate the dataset via script
 The other way to generate the dataset from the audio is through the *micmon* API
 itself. This option also enables you to take a peek at the audio data to better
 calibrate the parameters. For example:
 ```python
 import os
 from micmon.audio import AudioDirectory, AudioPlayer, AudioFile
 from micmon.dataset import DatasetWriter
 basedir = os.path.expanduser('~/datasets/sound-detect')
 audio_dir = os.path.join(basedir, 'audio/sample_1')
 datasets_dir = os.path.join(basedir, 'data')
 cutoff_frequencies = [250, 7500]
 # Scan the base audio_dir for labelled audio samples
 audio_dirs = AudioDirectory.scan(audio_dir)
 # Play some audio samples starting from 01:00
 for audio_dir in audio_dirs:
    with AudioFile(audio_dir, start='01:00', duration=5) as reader, \
            AudioPlayer() as player:
        for sample in reader:
            player.play(sample)
 # Plot the audio and spectrum of the audio samples in the first 10 seconds
 # of each audio file.
 for audio_dir in audio_dirs:
    with AudioFile(audio_dir, start=0, duration=10) as reader:
        for sample in reader:
            sample.plot_audio()
            sample.plot_spectrum(low_freq=cutoff_frequencies[0],
                                 high_freq=cutoff_frequencies[1])
 # Save the spectrum information and labels of the samples to a
 # different compressed file for each audio file.
 for audio_dir in audio_dirs:
    dataset_file = os.path.join(datasets_dir, os.path.basename(audio_dir.path) + '.npz')
    print(f'Processing audio sample {audio_dir.path}')
    with AudioFile(audio_dir) as reader, \
            DatasetWriter(dataset_file,
                          low_freq=cutoff_frequencies[0],
                          high_freq=cutoff_frequencies[1]) as writer:
        for sample in reader:
            writer += sample
 ```
 Training the model
 ------------------
 Once you have some `.npz` datasets saved under `~/datasets/sound-detect/data`, you can
 use those datasets to train a Tensorflow+Keras model to classify an audio segment. A full
 example is available under `examples/train.py`:
 ```python
 import os
 from keras import layers
 from micmon.dataset import Dataset
 from micmon.model import Model
 # This is a directory that contains the saved .npz dataset files
 datasets_dir = os.path.expanduser('~/datasets/sound-detect/data')
 # This is the output directory where the model will be saved
 model_dir = os.path.expanduser('~/models/sound-detect')
 # This is the number of training epochs for each dataset sample
 epochs = 2
 # Load the datasets from the compressed files.
 # 70% of the data points will be included in the training set,
 # 30% of the data points will be included in the evaluation set
 # and used to evaluate the performance of the model.
 datasets = Dataset.scan(datasets_dir, validation_split=0.3)
 labels = ['negative', 'positive']
 freq_bins = len(datasets[0].samples[0])
 # Create a network with 4 layers (one input layer, two intermediate layers and one output layer).
 # The first intermediate layer in this example will have twice the number of units as the number
 # of input units, while the second intermediate layer will have 75% of the number of
 # input units. We also specify the names for the labels and the low and high frequency range
 # used when sampling.
 model = Model(
    [
        layers.Input(shape=(freq_bins,)),
        layers.Dense(int(2 * freq_bins), activation='relu'),
        layers.Dense(int(0.75 * freq_bins), activation='relu'),
        layers.Dense(len(labels), activation='softmax'),
    ],
    labels=labels,
    low_freq=datasets[0].low_freq,
    high_freq=datasets[0].high_freq
 )
 # Train the model
 for epoch in range(epochs):
    for i, dataset in enumerate(datasets):
        print(f'[epoch {epoch+1}/{epochs}] [audio sample {i+1}/{len(datasets)}]')
        model.fit(dataset)
        evaluation = model.evaluate(dataset)
        print(f'Validation set loss and accuracy: {evaluation}')
 # Save the model
 model.save(model_dir, overwrite=True)
 ```
 At the end of the process you should find your Tensorflow model saved under `~/models/sound-detect`.
 You can use it in your scripts to classify audio samples from audio sources.
 Classifying audio samples
 -------------------------
 One use case is to analyze an audio file and use the model to detect specific sounds. Example:
 ```python
 import os
 from micmon.audio import AudioFile
 from micmon.model import Model
 model_dir = os.path.expanduser('~/models/sound-detect')
 model = Model.load(model_dir)
 cur_seconds = 60
 sample_duration = 2
 with AudioFile('/path/to/some/audio.mp3',
               start=cur_seconds, duration='10:00',
               sample_duration=sample_duration) as reader:
    for sample in reader:
        prediction = model.predict(sample)
        print(f'Audio segment at {cur_seconds} seconds: {prediction}')
        cur_seconds += sample_duration
 ```
 Another is to analyze live audio samples imported from an audio device - e.g. a USB microphone.
 Example:
 ```python
 import os
 from micmon.audio import AudioDevice
 from micmon.model import Model
 model_dir = os.path.expanduser('~/models/sound-detect')
 model = Model.load(model_dir)
 audio_system = 'alsa'        # Supported: alsa and pulse
 audio_device = 'plughw:1,0'  # Get list of recognized input devices with arecord -l
 with AudioDevice(audio_system, device=audio_device) as source:
    for sample in source:
        source.pause()  # Pause recording while we process the frame
        prediction = model.predict(sample)
        print(prediction)
        source.resume() # Resume recording
 ```
 You can use these two examples as blueprints to set up your own automation routines
 with sound detection.
--- a/examples/predict_from_audio_file.py
+++ b/examples/predict_from_audio_file.py
@ -0,0 +1,16 @@
 import os
 from micmon.audio import AudioFile
 from micmon.model import Model
 model_dir = os.path.expanduser('~/models/sound-detect')
 model = Model.load(model_dir)
 cur_seconds = 60
 sample_duration = 2
 with AudioFile('/path/to/some/audio.mp3', start=cur_seconds, duration='10:00',
               sample_duration=sample_duration) as reader:
    for sample in reader:
        prediction = model.predict(sample)
        print(f'Audio segment at {cur_seconds} seconds: {prediction}')
        cur_seconds += sample_duration
--- a/examples/predict_from_microphone.py
+++ b/examples/predict_from_microphone.py
@ -0,0 +1,18 @@
 import os
 from micmon.audio import AudioDevice
 from micmon.model import Model
 # Path to a previously saved sound detection Tensorflow model
 model_dir = os.path.expanduser('~/models/sound-detect')
 model = Model.load(model_dir)
 audio_system = 'alsa'        # Supported: alsa and pulse
 audio_device = 'plughw:1,0'  # Get list of recognized input devices with arecord -l
 with AudioDevice(audio_system, device=audio_device) as source:
    for sample in source:
        source.pause()  # Pause recording while we process the frame
        prediction = model.predict(sample)
        print(prediction)
        source.resume() # Resume recording
--- a/examples/train.py
+++ b/examples/train.py
@ -0,0 +1,54 @@
 # This script shows how to train a neural network to detect sounds given a training set of collected frequency
 # spectrum data.
 import os
 from keras import layers
 from micmon.dataset import Dataset
 from micmon.model import Model
 # This is a directory that contains the saved .npz dataset files
 datasets_dir = os.path.expanduser('~/datasets/baby-monitor/datasets')
 # This is the output directory where the model will be saved
 model_dir = os.path.expanduser(os.path.join('~', 'models', 'baby-monitor'))
 # This is the number of training epochs for each dataset sample
 epochs = 2
 # This value establishes the share of the dataset to be used for cross-validation
 validation_split = 0.3
 # Load the datasets from the compressed files
 datasets = Dataset.scan(datasets_dir, validation_split=0.3)
 # Get the number of frequency bins
 freq_bins = len(datasets[0].samples[0])
 # Create a network with 4 layers (one input layer, two intermediate layers and one output layer).
 # The first intermediate layer in this example will have twice the number of units as the number
 # of input units, while the second intermediate layer will have as many units as the number of
 # input units. We also specify the names for the labels and the low and high frequency range
 # used when sampling.
 model = Model(
    [
        layers.Input(shape=(freq_bins,)),
        layers.Dense(int(2.0 * freq_bins), activation='relu'),
        layers.Dense(int(freq_bins), activation='relu'),
        layers.Dense(len(datasets[0].labels), activation='softmax'),
    ],
    labels=['negative', 'positive'],
    low_freq=datasets[0].low_freq,
    high_freq=datasets[0].high_freq,
 )
 # Train the model
 for epoch in range(epochs):
    for i, dataset in enumerate(datasets):
        print(f'[epoch {epoch+1}/{epochs}] [audio sample {i+1}/{len(datasets)}]')
        model.fit(dataset)
        evaluation = model.evaluate(dataset)
        print(f'Validation set loss and accuracy: {evaluation}')
 # Save the model
 model.save(model_dir, overwrite=True)
--- a/micmon/init.py
+++ b/micmon/init.py
@ -1,4 +1,4 @@
 import logging
 import sys
-logging.basicConfig(level=logging.DEBUG, stream=sys.stdout)
+logging.basicConfig(level=logging.INFO, stream=sys.stdout)
--- a/micmon/audio/file.py
+++ b/micmon/audio/file.py
@ -1,24 +1,40 @@
 import json
 import os
 import pathlib
 from typing import Optional, List, Tuple, Union
-from micmon.audio import AudioDirectory, AudioSegment, AudioSource
+from micmon.audio import AudioSegment, AudioSource, AudioDirectory
 class AudioFile(AudioSource):
-    def __init__(self, path: AudioDirectory,
+    def __init__(self,
                 audio_file: Union[str, AudioDirectory],
                 labels_file: Optional[str] = None,
                 start: Union[str, int, float] = 0,
                 duration: Optional[Union[str, int, float]] = None,
                 *args, **kwargs):
        super().__init__(*args, **kwargs)
        if isinstance(audio_file, AudioDirectory):
            audio_file = audio_file.audio_file
            labels_file = audio_file.labels_file
        self.audio_file = os.path.abspath(os.path.expanduser(audio_file))
        if not labels_file:
            labels_file = os.path.join(pathlib.Path(self.audio_file).parent, 'labels.json')
            if not os.path.isfile(labels_file):
                labels_file = None
        self.labels_file = os.path.abspath(os.path.expanduser(labels_file)) if labels_file else None
        self.ffmpeg_args = (
-            self.ffmpeg_bin, '-i', path.audio_file, *(('-ss', str(start)) if start else ()),
+            self.ffmpeg_bin, '-i', audio_file, *(('-ss', str(start)) if start else ()),
            *(('-t', str(duration)) if duration else ()), *self.ffmpeg_base_args
        )
        self.start = self.convert_time(start)/1000
        self.duration = self.convert_time(duration)/1000
-        self.segments = self.parse_labels_file(path.labels_file) \
+        self.segments = self.parse_labels_file(labels_file) \
-            if path.labels_file else []
+            if labels_file else []
        self.labels = sorted(list(set(label for timestamp, label in self.segments)))
        self.cur_time = self.start
@ -53,4 +69,3 @@ class AudioFile(AudioSource):
            return audio
        raise StopIteration
--- a/micmon/model/init.py
+++ b/micmon/model/init.py
@ -2,7 +2,7 @@ import json
 import os
 import numpy as np
-from typing import List, Optional, Union, Tuple
+from typing import List, Optional, Union
 from keras import Sequential, losses, optimizers, metrics
 from keras.layers import Layer
 from keras.models import load_model, Model as _Model
@ -20,10 +20,11 @@ class Model:
                 model: Optional[_Model] = None, optimizer: Union[str, optimizers.Optimizer] = 'adam',
                 loss: Union[str, losses.Loss] = losses.SparseCategoricalCrossentropy(from_logits=True),
                 metrics: List[Union[str, metrics.Metric]] = ('accuracy',),
-                 cutoff_frequencies: Tuple[int, int] = (AudioSegment.default_low_freq, AudioSegment.default_high_freq)):
+                 low_freq: int = AudioSegment.default_low_freq,
                 high_freq: int = AudioSegment.default_high_freq):
        assert layers or model
        self.label_names = labels
-        self.cutoff_frequencies = list(map(int, cutoff_frequencies))
+        self.cutoff_frequencies = (int(low_freq), int(high_freq))
        if layers:
            self._model = Sequential(layers)
@ -74,4 +75,4 @@ class Model:
            with open(freq_file, 'r') as f:
                frequencies = json.load(f)
-        return cls(model=model, labels=label_names, cutoff_frequencies=frequencies)
+        return cls(model=model, labels=label_names, low_freq=frequencies[0], high_freq=frequencies[1])
--- a/micmon/utils/init.py
+++ b/micmon/utils/init.py
--- a/micmon/utils/datagen.py
+++ b/micmon/utils/datagen.py
@ -0,0 +1,117 @@
 import argparse
 import logging
 import os
 import sys
 from micmon.audio import AudioDirectory, AudioFile, AudioSegment
 from micmon.dataset import DatasetWriter
 logger = logging.getLogger(__name__)
 defaults = {
    'sample_duration': 2.0,
    'sample_rate': 44100,
    'channels': 1,
    'ffmpeg_bin': 'ffmpeg',
 }
 def create_dataset(audio_dir: str, dataset_dir: str,
                   low_freq: int = AudioSegment.default_low_freq,
                   high_freq: int = AudioSegment.default_high_freq,
                   bins: int = AudioSegment.default_bins,
                   sample_duration: float = defaults['sample_duration'],
                   sample_rate: int = defaults['sample_rate'],
                   channels: int = defaults['channels'],
                   ffmpeg_bin: str = defaults['ffmpeg_bin']):
    audio_dir = os.path.abspath(os.path.expanduser(audio_dir))
    dataset_dir = os.path.abspath(os.path.expanduser(dataset_dir))
    audio_dirs = AudioDirectory.scan(audio_dir)
    for audio_dir in audio_dirs:
        dataset_file = os.path.join(dataset_dir, os.path.basename(audio_dir.path) + '.npz')
        logger.info(f'Processing audio sample {audio_dir.path}')
        with AudioFile(audio_dir.audio_file, audio_dir.labels_file,
                       sample_duration=sample_duration, sample_rate=sample_rate, channels=channels,
                       ffmpeg_bin=os.path.expanduser(ffmpeg_bin)) as reader, \
                DatasetWriter(dataset_file, low_freq=low_freq, high_freq=high_freq, bins=bins) as writer:
            for sample in reader:
                writer += sample
 def main():
    # noinspection PyTypeChecker
    parser = argparse.ArgumentParser(
        description='''
 Tool to create numpy dataset files with audio spectrum data from a set of labelled raw audio files.''',
        epilog='''
 - audio_dir should contain a list of sub-directories, each of which represents a labelled audio sample.
  audio_dir should have the following structure:
  audio_dir/
    -> train_sample_1
      -> audio.mp3
      -> labels.json
    -> train_sample_2
      -> audio.mp3
      -> labels.json
  ...
 - labels.json is a key-value JSON file that contains the labels for each audio segment. Example:
   {
     "00:00": "negative",
     "02:13": "positive",
     "04:57": "negative",
     "15:41": "positive",
     "18:24": "negative"
   }
  Each entry indicates that all the audio samples between the specified timestamp and the next entry or
  the end of the audio file should be applied the specified label.
 - dataset_dir is the directory where the generated labelled spectrum dataset in .npz format will be saved.
  Each dataset file will be named like its associated audio samples directory.''',
        formatter_class=argparse.RawDescriptionHelpFormatter
    )
    parser.add_argument('audio_dir', help='Directory containing the raw audio samples directories to be scanned.')
    parser.add_argument('dataset_dir', help='Destination directory for the compressed .npz files containing the '
                                            'frequency spectrum datasets.')
    parser.add_argument('--low', help='Specify the lowest frequency to be considered in the generated frequency '
                                      'spectrum. Default: 20 Hz (lowest possible frequency audible to a human ear).',
                        required=False, default=AudioSegment.default_low_freq, dest='low_freq', type=int)
    parser.add_argument('--high', help='Specify the highest frequency to be considered in the generated frequency '
                                       'spectrum. Default: 20 kHz (highest possible frequency audible to a human ear).',
                        required=False, default=AudioSegment.default_high_freq, dest='high_freq', type=int)
    parser.add_argument('-b', '--bins', help=f'Specify the number of frequency bins to be used for the spectrum '
                                             f'analysis (default: {AudioSegment.default_bins})',
                        required=False, default=AudioSegment.default_bins, dest='bins', type=int)
    parser.add_argument('-d', '--sample-duration', help=f'The script will calculate the spectrum of audio segments of '
                                                        f'this specified length in seconds (default: '
                                                        f'{defaults["sample_duration"]}).',
                        required=False, default=defaults['sample_duration'], dest='sample_duration', type=float)
    parser.add_argument('-r', '--sample-rate', help=f'Audio sample rate (default: {defaults["sample_rate"]} Hz)',
                        required=False, default=defaults['sample_rate'], dest='sample_rate', type=int)
    parser.add_argument('-c', '--channels', help=f'Number of destination audio channels (default: '
                                                 f'{defaults["channels"]})',
                        required=False, default=defaults['channels'], dest='channels', type=int)
    parser.add_argument('--ffmpeg', help=f'Absolute path to the ffmpeg executable (default: {defaults["ffmpeg_bin"]})',
                        required=False, default=defaults['ffmpeg_bin'], dest='ffmpeg_bin', type=str)
    opts, args = parser.parse_known_args(sys.argv[1:])
    return create_dataset(audio_dir=opts.audio_dir, dataset_dir=opts.dataset_dir, low_freq=opts.low_freq,
                          high_freq=opts.high_freq, bins=opts.bins, sample_duration=opts.sample_duration,
                          sample_rate=opts.sample_rate, channels=opts.channels, ffmpeg_bin=opts.ffmpeg_bin)
 if __name__ == '__main__':
    main()
--- a/notebooks/dataset.ipynb
+++ b/notebooks/dataset.ipynb
--- a/notebooks/predict.ipynb
+++ b/notebooks/predict.ipynb
@ -46,9 +46,7 @@
    "from micmon.audio import AudioDevice, AudioPlayer\n",
    "from micmon.model import Model\n",
    "\n",
-    "basedir = os.path.expanduser(os.path.join('~', 'projects', 'baby-monitor'))\n",
+    "model_dir = os.path.expanduser(os.path.join('~', 'models', 'baby-monitor'))\n",
    "models_dir = os.path.join(basedir, 'models')\n",
    "model_path = os.path.join(models_dir, 'baby-monitor')\n",
    "audio_system = 'alsa'\n",
    "audio_device = 'plughw:3,0'\n",
    "label_names = ['negative', 'positive']"
@ -68,7 +66,7 @@
   "execution_count": 2,
   "outputs": [],
   "source": [
-    "model = Model.load(model_path)"
+    "model = Model.load(model_dir)"
   ],
   "metadata": {
    "collapsed": false,
@ -122,23 +120,17 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "negative\n",
      "negative\n",
      "negative\n",
      "negative\n",
      "negative\n",
      "negative\n",
      "negative\n",
      "negative\n",
      "negative\n",
      "negative\n"
     ]
    },
    {
     "ename": "KeyboardInterrupt",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001B[0;31m---------------------------------------------------------------------------\u001B[0m",
      "\u001B[0;31mKeyboardInterrupt\u001B[0m                         Traceback (most recent call last)",
      "\u001B[0;32m<ipython-input-3-27c83a302cb8>\u001B[0m in \u001B[0;36m<module>\u001B[0;34m\u001B[0m\n\u001B[1;32m      1\u001B[0m \u001B[0;32mwith\u001B[0m \u001B[0mAudioDevice\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0maudio_system\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mdevice\u001B[0m\u001B[0;34m=\u001B[0m\u001B[0maudio_device\u001B[0m\u001B[0;34m)\u001B[0m \u001B[0;32mas\u001B[0m \u001B[0msource\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m----> 2\u001B[0;31m     \u001B[0;32mfor\u001B[0m \u001B[0msample\u001B[0m \u001B[0;32min\u001B[0m \u001B[0msource\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m      3\u001B[0m         \u001B[0msource\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mpause\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m      4\u001B[0m         \u001B[0mprediction\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0mmodel\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mpredict\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0msample\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m      5\u001B[0m         \u001B[0mprint\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mprediction\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n",
      "\u001B[0;32m~/projects/baby-monitor/micmon/audio/source.py\u001B[0m in \u001B[0;36m__next__\u001B[0;34m(self)\u001B[0m\n\u001B[1;32m     36\u001B[0m             \u001B[0;32mraise\u001B[0m \u001B[0mStopIteration\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m     37\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m---> 38\u001B[0;31m         \u001B[0mdata\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mffmpeg\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mstdout\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mread\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mbufsize\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m     39\u001B[0m         \u001B[0;32mif\u001B[0m \u001B[0mdata\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m     40\u001B[0m             \u001B[0;32mreturn\u001B[0m \u001B[0mAudioSegment\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mdata\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0msample_rate\u001B[0m\u001B[0;34m=\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0msample_rate\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mchannels\u001B[0m\u001B[0;34m=\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mchannels\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n",
      "\u001B[0;31mKeyboardInterrupt\u001B[0m: "
     ]
    }
   ],
   "source": [
@ -147,7 +139,7 @@
    "        source.pause()\n",
    "        prediction = model.predict(sample)\n",
    "        print(prediction)\n",
-    "        source.resume()"
+    "        source.resume()\n"
   ],
   "metadata": {
    "collapsed": false,
--- a/notebooks/train.ipynb
+++ b/notebooks/train.ipynb
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1,3 @@
 numpy
 tensorflow
 keras
--- a/setup.py
+++ b/setup.py
@ -0,0 +1,40 @@
 #!/usr/bin/env python
 import os
 from setuptools import setup, find_packages
 def path(fname=''):
    return os.path.abspath(os.path.join(os.path.dirname(__file__), fname))
 def readfile(fname):
    with open(path(fname)) as f:
        return f.read()
 setup(
    name="micmon",
    version="0.1",
    author="Fabio Manganiello",
    author_email="info@fabiomanganiello.com",
    description="Programmable Tensorflow-based sound/noise detector",
    license="MIT",
    python_requires='>= 3.6',
    keywords="machine-learning tensorflow sound-detection",
    url="https://github.com/BlackLight/micmon",
    packages=find_packages(),
    include_package_data=True,
    long_description=readfile('README.md'),
    long_description_content_type='text/markdown',
    entry_points={
        'console_scripts': [
            'micmon-datagen=micmon.utils.datagen:main',
        ],
    },
    classifiers=[
        "Topic :: Utilities",
        "License :: OSI Approved :: MIT License",
        "Development Status :: 3 - Alpha",
    ],
 )