mirror of
https://github.com/BlackLight/micmon.git
synced 2024-11-24 04:35:13 +01:00
Added proper README and examples
This commit is contained in:
parent
2f578929fb
commit
d867880199
16 changed files with 764 additions and 120 deletions
3
.gitignore
vendored
3
.gitignore
vendored
|
@ -3,3 +3,6 @@
|
||||||
/data/
|
/data/
|
||||||
/models/
|
/models/
|
||||||
__pycache__
|
__pycache__
|
||||||
|
/build
|
||||||
|
/dist
|
||||||
|
*.egg-info
|
||||||
|
|
22
LICENSE.txt
Normal file
22
LICENSE.txt
Normal file
|
@ -0,0 +1,22 @@
|
||||||
|
MIT License
|
||||||
|
|
||||||
|
Copyright (c) 2017, 2020 Fabio Manganiello
|
||||||
|
|
||||||
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
in the Software without restriction, including without limitation the rights
|
||||||
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||||
|
copies of the Software, and to permit persons to whom the Software is
|
||||||
|
furnished to do so, subject to the following conditions:
|
||||||
|
|
||||||
|
The above copyright notice and this permission notice shall be included in all
|
||||||
|
copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||||
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||||
|
SOFTWARE.
|
||||||
|
|
313
README.md
Normal file
313
README.md
Normal file
|
@ -0,0 +1,313 @@
|
||||||
|
micmon
|
||||||
|
======
|
||||||
|
|
||||||
|
*micmon* is a ML-powered library to detect sounds in an audio stream,
|
||||||
|
either from a file or from an audio input. The use case for its development
|
||||||
|
has been the creation of a self-built baby monitor to detect the cries
|
||||||
|
of my new born through a RaspberryPi + USB microphone, but it should be
|
||||||
|
good enough to detect any type of noise or audio if used with a well trained
|
||||||
|
model.
|
||||||
|
|
||||||
|
It works by splitting an audio stream into short segments, it calculates the
|
||||||
|
FFT and spectrum bins for each of these segments, and it uses such spectrum
|
||||||
|
data to train a model to detect the audio. It works well with sounds that are
|
||||||
|
loud enough to stand out of the background (it's good at detecting e.g. the
|
||||||
|
sound of an alarm clock, not the sound of flying mosquitto), that are long
|
||||||
|
enough compared to the size of the chunks (very short sounds will leave a
|
||||||
|
very small trace in the spectrum of an audio chunk) and, even better, if
|
||||||
|
their frequency bandwidth doesn't overlap a lot with other sounds (it's good
|
||||||
|
at detecting the cries of your baby, since his/her voice has a higher pitch
|
||||||
|
than yours, but it may not detect difference in the spectral signature of
|
||||||
|
the voice of two adult men in the same age group). It's not going to perform
|
||||||
|
very well if instead you are trying to use to detect speech - since it operates
|
||||||
|
on time-agnostic frequency data from chunks of audio it's not granular enough
|
||||||
|
for proper speech-to-text applications, and it wouldn't be robust enough to
|
||||||
|
detect differences in voice pitch, tone or accent.
|
||||||
|
|
||||||
|
Dependencies
|
||||||
|
------------
|
||||||
|
|
||||||
|
The software uses *ffmpeg* to record and decode audio - check instructions for
|
||||||
|
your OS on how to get it installed. It also requires *lame* or any other mp3
|
||||||
|
encoder to encode captured audio to mp3.
|
||||||
|
|
||||||
|
Python dependencies:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install numpy tensorflow keras
|
||||||
|
|
||||||
|
# Optional, for graphs
|
||||||
|
pip install matplotlib
|
||||||
|
```
|
||||||
|
|
||||||
|
Installation
|
||||||
|
------------
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone https://github.com/BlackLight/micmon
|
||||||
|
cd micmon
|
||||||
|
python setup.py install
|
||||||
|
```
|
||||||
|
|
||||||
|
Audio capture
|
||||||
|
-------------
|
||||||
|
|
||||||
|
Once the software is installed, you can proceed with recording some audio that
|
||||||
|
will be used for training the model. First create a directory for your audio
|
||||||
|
samples dataset:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# This folder will store our audio samples
|
||||||
|
mkdir -p ~/datasets/sound-detect/audio
|
||||||
|
|
||||||
|
# This folder will store the datasets
|
||||||
|
# generated from the labelled audio samples
|
||||||
|
mkdir -p ~/datasets/sound-detect/data
|
||||||
|
|
||||||
|
# This folder will store the generated
|
||||||
|
# Tensorflow models
|
||||||
|
mkdir -p ~/models
|
||||||
|
|
||||||
|
cd ~/datasets/sound-detect/audio
|
||||||
|
```
|
||||||
|
|
||||||
|
Then create a new sub-folder for your first audio sample and start recording.
|
||||||
|
Example:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mkdir sample_1
|
||||||
|
cd sample_1
|
||||||
|
arecord -D plughw:0,1 -f cd | lame - audio.mp3
|
||||||
|
```
|
||||||
|
|
||||||
|
In the example above we are using *arecord* to record from the second channel
|
||||||
|
of the first audio device (check a list of available recording devices with
|
||||||
|
*arecord -l*) in WAV format, and we are then using the *lame* encoder to
|
||||||
|
convert the raw audio to mp3. When done with recording, just Ctrl-C the
|
||||||
|
application and your audio file will be ready.
|
||||||
|
|
||||||
|
Audio labelling
|
||||||
|
---------------
|
||||||
|
|
||||||
|
In the same directory as your sample (in the example above it will be
|
||||||
|
`~/datasets/sound-detect/audio/sample_1`) create a new file named
|
||||||
|
`labels.json`. Now open your audio file in Audacity or any audio player
|
||||||
|
and identify the audio segments that match your criteria - for example
|
||||||
|
when your baby is crying, when the alarm starts, when your neighbour
|
||||||
|
starts drilling the wall, or whatever the criteria is. `labels.json`
|
||||||
|
should contain a key-value mapping in the form of `start_time -> label`.
|
||||||
|
Example:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"00:00": "negative",
|
||||||
|
"02:13": "positive",
|
||||||
|
"04:57": "negative",
|
||||||
|
"15:41": "positive",
|
||||||
|
"18:24": "negative"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
In the example above, all the audio segments between 00:00 and 02:12 will
|
||||||
|
be labelled as negative, all the segments between 02:13 and 04:56 as
|
||||||
|
positive, and so on.
|
||||||
|
|
||||||
|
You can now use *micmon* to generate a frequency spectrum dataset out of
|
||||||
|
your labelled audio. You can do it either through the `micmon-datagen`
|
||||||
|
script or with your own script.
|
||||||
|
|
||||||
|
### micmon-datagen
|
||||||
|
|
||||||
|
Type `micmon-datagen --help` to get a full list of the available options.
|
||||||
|
In general, `micmon-datagen` requires a directory that contains the labelled
|
||||||
|
audio samples sub-directories as input and a directory where the calculated
|
||||||
|
numpy-compressed datasets will be stored. If you want to generate the dataset
|
||||||
|
for the audio samples captured on the previous iteration then the command
|
||||||
|
will be something like this:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
micmon-datagen --low 250 --high 7500 --bins 100 --sample-duration 2 --channels 1 \
|
||||||
|
~/datasets/sound-detect/audio ~/models
|
||||||
|
```
|
||||||
|
|
||||||
|
The `--low` and `--high` options respectively identify the lowest and highest
|
||||||
|
frequencies that should be taken into account in the output spectrum. By default
|
||||||
|
these values are 20 Hz and 20 kHz (respectively the lowest and highest frequency
|
||||||
|
audible to a healthy and young human ear), but you can narrow down the frequency
|
||||||
|
space to only detect the frequencies that you're interested in and to remove
|
||||||
|
high-frequency harmonics that may spoil your data. A good way to estimate the
|
||||||
|
frequency space is to use e.g. Audacity or any audio equalizer to select the
|
||||||
|
segments of your audio that contain the sounds that you want to detect and
|
||||||
|
check their dominant frequencies - you definitely want those frequencies to be
|
||||||
|
included in your range.
|
||||||
|
|
||||||
|
`--bins` specifies in how many segments/buckets the frequency spectrum should
|
||||||
|
be split - 100 bins is the default value. `--sample-duration` specifies the
|
||||||
|
duration in seconds for each spectrum data point - 2 seconds is the default
|
||||||
|
value, i.e. the audio samples will be read in chunks of 2 seconds each and the
|
||||||
|
spectrum will be calculated for each of these chunks. If the sounds you want to
|
||||||
|
detect are shorter then you may want to reduce this value.
|
||||||
|
|
||||||
|
### Generate the dataset via script
|
||||||
|
|
||||||
|
The other way to generate the dataset from the audio is through the *micmon* API
|
||||||
|
itself. This option also enables you to take a peek at the audio data to better
|
||||||
|
calibrate the parameters. For example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import os
|
||||||
|
|
||||||
|
from micmon.audio import AudioDirectory, AudioPlayer, AudioFile
|
||||||
|
from micmon.dataset import DatasetWriter
|
||||||
|
|
||||||
|
basedir = os.path.expanduser('~/datasets/sound-detect')
|
||||||
|
audio_dir = os.path.join(basedir, 'audio/sample_1')
|
||||||
|
datasets_dir = os.path.join(basedir, 'data')
|
||||||
|
cutoff_frequencies = [250, 7500]
|
||||||
|
|
||||||
|
# Scan the base audio_dir for labelled audio samples
|
||||||
|
audio_dirs = AudioDirectory.scan(audio_dir)
|
||||||
|
|
||||||
|
# Play some audio samples starting from 01:00
|
||||||
|
for audio_dir in audio_dirs:
|
||||||
|
with AudioFile(audio_dir, start='01:00', duration=5) as reader, \
|
||||||
|
AudioPlayer() as player:
|
||||||
|
for sample in reader:
|
||||||
|
player.play(sample)
|
||||||
|
|
||||||
|
# Plot the audio and spectrum of the audio samples in the first 10 seconds
|
||||||
|
# of each audio file.
|
||||||
|
for audio_dir in audio_dirs:
|
||||||
|
with AudioFile(audio_dir, start=0, duration=10) as reader:
|
||||||
|
for sample in reader:
|
||||||
|
sample.plot_audio()
|
||||||
|
sample.plot_spectrum(low_freq=cutoff_frequencies[0],
|
||||||
|
high_freq=cutoff_frequencies[1])
|
||||||
|
|
||||||
|
# Save the spectrum information and labels of the samples to a
|
||||||
|
# different compressed file for each audio file.
|
||||||
|
for audio_dir in audio_dirs:
|
||||||
|
dataset_file = os.path.join(datasets_dir, os.path.basename(audio_dir.path) + '.npz')
|
||||||
|
print(f'Processing audio sample {audio_dir.path}')
|
||||||
|
|
||||||
|
with AudioFile(audio_dir) as reader, \
|
||||||
|
DatasetWriter(dataset_file,
|
||||||
|
low_freq=cutoff_frequencies[0],
|
||||||
|
high_freq=cutoff_frequencies[1]) as writer:
|
||||||
|
for sample in reader:
|
||||||
|
writer += sample
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
Training the model
|
||||||
|
------------------
|
||||||
|
|
||||||
|
Once you have some `.npz` datasets saved under `~/datasets/sound-detect/data`, you can
|
||||||
|
use those datasets to train a Tensorflow+Keras model to classify an audio segment. A full
|
||||||
|
example is available under `examples/train.py`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import os
|
||||||
|
from keras import layers
|
||||||
|
|
||||||
|
from micmon.dataset import Dataset
|
||||||
|
from micmon.model import Model
|
||||||
|
|
||||||
|
# This is a directory that contains the saved .npz dataset files
|
||||||
|
datasets_dir = os.path.expanduser('~/datasets/sound-detect/data')
|
||||||
|
|
||||||
|
# This is the output directory where the model will be saved
|
||||||
|
model_dir = os.path.expanduser('~/models/sound-detect')
|
||||||
|
|
||||||
|
# This is the number of training epochs for each dataset sample
|
||||||
|
epochs = 2
|
||||||
|
|
||||||
|
# Load the datasets from the compressed files.
|
||||||
|
# 70% of the data points will be included in the training set,
|
||||||
|
# 30% of the data points will be included in the evaluation set
|
||||||
|
# and used to evaluate the performance of the model.
|
||||||
|
datasets = Dataset.scan(datasets_dir, validation_split=0.3)
|
||||||
|
labels = ['negative', 'positive']
|
||||||
|
freq_bins = len(datasets[0].samples[0])
|
||||||
|
|
||||||
|
# Create a network with 4 layers (one input layer, two intermediate layers and one output layer).
|
||||||
|
# The first intermediate layer in this example will have twice the number of units as the number
|
||||||
|
# of input units, while the second intermediate layer will have 75% of the number of
|
||||||
|
# input units. We also specify the names for the labels and the low and high frequency range
|
||||||
|
# used when sampling.
|
||||||
|
model = Model(
|
||||||
|
[
|
||||||
|
layers.Input(shape=(freq_bins,)),
|
||||||
|
layers.Dense(int(2 * freq_bins), activation='relu'),
|
||||||
|
layers.Dense(int(0.75 * freq_bins), activation='relu'),
|
||||||
|
layers.Dense(len(labels), activation='softmax'),
|
||||||
|
],
|
||||||
|
labels=labels,
|
||||||
|
low_freq=datasets[0].low_freq,
|
||||||
|
high_freq=datasets[0].high_freq
|
||||||
|
)
|
||||||
|
|
||||||
|
# Train the model
|
||||||
|
for epoch in range(epochs):
|
||||||
|
for i, dataset in enumerate(datasets):
|
||||||
|
print(f'[epoch {epoch+1}/{epochs}] [audio sample {i+1}/{len(datasets)}]')
|
||||||
|
model.fit(dataset)
|
||||||
|
evaluation = model.evaluate(dataset)
|
||||||
|
print(f'Validation set loss and accuracy: {evaluation}')
|
||||||
|
|
||||||
|
# Save the model
|
||||||
|
model.save(model_dir, overwrite=True)
|
||||||
|
```
|
||||||
|
|
||||||
|
At the end of the process you should find your Tensorflow model saved under `~/models/sound-detect`.
|
||||||
|
You can use it in your scripts to classify audio samples from audio sources.
|
||||||
|
|
||||||
|
Classifying audio samples
|
||||||
|
-------------------------
|
||||||
|
|
||||||
|
One use case is to analyze an audio file and use the model to detect specific sounds. Example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import os
|
||||||
|
|
||||||
|
from micmon.audio import AudioFile
|
||||||
|
from micmon.model import Model
|
||||||
|
|
||||||
|
model_dir = os.path.expanduser('~/models/sound-detect')
|
||||||
|
model = Model.load(model_dir)
|
||||||
|
cur_seconds = 60
|
||||||
|
sample_duration = 2
|
||||||
|
|
||||||
|
with AudioFile('/path/to/some/audio.mp3',
|
||||||
|
start=cur_seconds, duration='10:00',
|
||||||
|
sample_duration=sample_duration) as reader:
|
||||||
|
for sample in reader:
|
||||||
|
prediction = model.predict(sample)
|
||||||
|
print(f'Audio segment at {cur_seconds} seconds: {prediction}')
|
||||||
|
cur_seconds += sample_duration
|
||||||
|
```
|
||||||
|
|
||||||
|
Another is to analyze live audio samples imported from an audio device - e.g. a USB microphone.
|
||||||
|
Example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import os
|
||||||
|
|
||||||
|
from micmon.audio import AudioDevice
|
||||||
|
from micmon.model import Model
|
||||||
|
|
||||||
|
model_dir = os.path.expanduser('~/models/sound-detect')
|
||||||
|
model = Model.load(model_dir)
|
||||||
|
audio_system = 'alsa' # Supported: alsa and pulse
|
||||||
|
audio_device = 'plughw:1,0' # Get list of recognized input devices with arecord -l
|
||||||
|
|
||||||
|
with AudioDevice(audio_system, device=audio_device) as source:
|
||||||
|
for sample in source:
|
||||||
|
source.pause() # Pause recording while we process the frame
|
||||||
|
prediction = model.predict(sample)
|
||||||
|
print(prediction)
|
||||||
|
source.resume() # Resume recording
|
||||||
|
```
|
||||||
|
|
||||||
|
You can use these two examples as blueprints to set up your own automation routines
|
||||||
|
with sound detection.
|
16
examples/predict_from_audio_file.py
Normal file
16
examples/predict_from_audio_file.py
Normal file
|
@ -0,0 +1,16 @@
|
||||||
|
import os
|
||||||
|
|
||||||
|
from micmon.audio import AudioFile
|
||||||
|
from micmon.model import Model
|
||||||
|
|
||||||
|
model_dir = os.path.expanduser('~/models/sound-detect')
|
||||||
|
model = Model.load(model_dir)
|
||||||
|
cur_seconds = 60
|
||||||
|
sample_duration = 2
|
||||||
|
|
||||||
|
with AudioFile('/path/to/some/audio.mp3', start=cur_seconds, duration='10:00',
|
||||||
|
sample_duration=sample_duration) as reader:
|
||||||
|
for sample in reader:
|
||||||
|
prediction = model.predict(sample)
|
||||||
|
print(f'Audio segment at {cur_seconds} seconds: {prediction}')
|
||||||
|
cur_seconds += sample_duration
|
18
examples/predict_from_microphone.py
Normal file
18
examples/predict_from_microphone.py
Normal file
|
@ -0,0 +1,18 @@
|
||||||
|
import os
|
||||||
|
|
||||||
|
from micmon.audio import AudioDevice
|
||||||
|
from micmon.model import Model
|
||||||
|
|
||||||
|
# Path to a previously saved sound detection Tensorflow model
|
||||||
|
model_dir = os.path.expanduser('~/models/sound-detect')
|
||||||
|
model = Model.load(model_dir)
|
||||||
|
|
||||||
|
audio_system = 'alsa' # Supported: alsa and pulse
|
||||||
|
audio_device = 'plughw:1,0' # Get list of recognized input devices with arecord -l
|
||||||
|
|
||||||
|
with AudioDevice(audio_system, device=audio_device) as source:
|
||||||
|
for sample in source:
|
||||||
|
source.pause() # Pause recording while we process the frame
|
||||||
|
prediction = model.predict(sample)
|
||||||
|
print(prediction)
|
||||||
|
source.resume() # Resume recording
|
54
examples/train.py
Normal file
54
examples/train.py
Normal file
|
@ -0,0 +1,54 @@
|
||||||
|
# This script shows how to train a neural network to detect sounds given a training set of collected frequency
|
||||||
|
# spectrum data.
|
||||||
|
|
||||||
|
import os
|
||||||
|
from keras import layers
|
||||||
|
|
||||||
|
from micmon.dataset import Dataset
|
||||||
|
from micmon.model import Model
|
||||||
|
|
||||||
|
# This is a directory that contains the saved .npz dataset files
|
||||||
|
datasets_dir = os.path.expanduser('~/datasets/baby-monitor/datasets')
|
||||||
|
|
||||||
|
# This is the output directory where the model will be saved
|
||||||
|
model_dir = os.path.expanduser(os.path.join('~', 'models', 'baby-monitor'))
|
||||||
|
|
||||||
|
# This is the number of training epochs for each dataset sample
|
||||||
|
epochs = 2
|
||||||
|
|
||||||
|
# This value establishes the share of the dataset to be used for cross-validation
|
||||||
|
validation_split = 0.3
|
||||||
|
|
||||||
|
# Load the datasets from the compressed files
|
||||||
|
datasets = Dataset.scan(datasets_dir, validation_split=0.3)
|
||||||
|
|
||||||
|
# Get the number of frequency bins
|
||||||
|
freq_bins = len(datasets[0].samples[0])
|
||||||
|
|
||||||
|
# Create a network with 4 layers (one input layer, two intermediate layers and one output layer).
|
||||||
|
# The first intermediate layer in this example will have twice the number of units as the number
|
||||||
|
# of input units, while the second intermediate layer will have as many units as the number of
|
||||||
|
# input units. We also specify the names for the labels and the low and high frequency range
|
||||||
|
# used when sampling.
|
||||||
|
model = Model(
|
||||||
|
[
|
||||||
|
layers.Input(shape=(freq_bins,)),
|
||||||
|
layers.Dense(int(2.0 * freq_bins), activation='relu'),
|
||||||
|
layers.Dense(int(freq_bins), activation='relu'),
|
||||||
|
layers.Dense(len(datasets[0].labels), activation='softmax'),
|
||||||
|
],
|
||||||
|
labels=['negative', 'positive'],
|
||||||
|
low_freq=datasets[0].low_freq,
|
||||||
|
high_freq=datasets[0].high_freq,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Train the model
|
||||||
|
for epoch in range(epochs):
|
||||||
|
for i, dataset in enumerate(datasets):
|
||||||
|
print(f'[epoch {epoch+1}/{epochs}] [audio sample {i+1}/{len(datasets)}]')
|
||||||
|
model.fit(dataset)
|
||||||
|
evaluation = model.evaluate(dataset)
|
||||||
|
print(f'Validation set loss and accuracy: {evaluation}')
|
||||||
|
|
||||||
|
# Save the model
|
||||||
|
model.save(model_dir, overwrite=True)
|
|
@ -1,4 +1,4 @@
|
||||||
import logging
|
import logging
|
||||||
import sys
|
import sys
|
||||||
|
|
||||||
logging.basicConfig(level=logging.DEBUG, stream=sys.stdout)
|
logging.basicConfig(level=logging.INFO, stream=sys.stdout)
|
||||||
|
|
|
@ -1,24 +1,40 @@
|
||||||
import json
|
import json
|
||||||
|
import os
|
||||||
|
import pathlib
|
||||||
from typing import Optional, List, Tuple, Union
|
from typing import Optional, List, Tuple, Union
|
||||||
|
|
||||||
from micmon.audio import AudioDirectory, AudioSegment, AudioSource
|
from micmon.audio import AudioSegment, AudioSource, AudioDirectory
|
||||||
|
|
||||||
|
|
||||||
class AudioFile(AudioSource):
|
class AudioFile(AudioSource):
|
||||||
def __init__(self, path: AudioDirectory,
|
def __init__(self,
|
||||||
|
audio_file: Union[str, AudioDirectory],
|
||||||
|
labels_file: Optional[str] = None,
|
||||||
start: Union[str, int, float] = 0,
|
start: Union[str, int, float] = 0,
|
||||||
duration: Optional[Union[str, int, float]] = None,
|
duration: Optional[Union[str, int, float]] = None,
|
||||||
*args, **kwargs):
|
*args, **kwargs):
|
||||||
super().__init__(*args, **kwargs)
|
super().__init__(*args, **kwargs)
|
||||||
|
if isinstance(audio_file, AudioDirectory):
|
||||||
|
audio_file = audio_file.audio_file
|
||||||
|
labels_file = audio_file.labels_file
|
||||||
|
|
||||||
|
self.audio_file = os.path.abspath(os.path.expanduser(audio_file))
|
||||||
|
|
||||||
|
if not labels_file:
|
||||||
|
labels_file = os.path.join(pathlib.Path(self.audio_file).parent, 'labels.json')
|
||||||
|
if not os.path.isfile(labels_file):
|
||||||
|
labels_file = None
|
||||||
|
|
||||||
|
self.labels_file = os.path.abspath(os.path.expanduser(labels_file)) if labels_file else None
|
||||||
self.ffmpeg_args = (
|
self.ffmpeg_args = (
|
||||||
self.ffmpeg_bin, '-i', path.audio_file, *(('-ss', str(start)) if start else ()),
|
self.ffmpeg_bin, '-i', audio_file, *(('-ss', str(start)) if start else ()),
|
||||||
*(('-t', str(duration)) if duration else ()), *self.ffmpeg_base_args
|
*(('-t', str(duration)) if duration else ()), *self.ffmpeg_base_args
|
||||||
)
|
)
|
||||||
|
|
||||||
self.start = self.convert_time(start)/1000
|
self.start = self.convert_time(start)/1000
|
||||||
self.duration = self.convert_time(duration)/1000
|
self.duration = self.convert_time(duration)/1000
|
||||||
self.segments = self.parse_labels_file(path.labels_file) \
|
self.segments = self.parse_labels_file(labels_file) \
|
||||||
if path.labels_file else []
|
if labels_file else []
|
||||||
|
|
||||||
self.labels = sorted(list(set(label for timestamp, label in self.segments)))
|
self.labels = sorted(list(set(label for timestamp, label in self.segments)))
|
||||||
self.cur_time = self.start
|
self.cur_time = self.start
|
||||||
|
@ -53,4 +69,3 @@ class AudioFile(AudioSource):
|
||||||
return audio
|
return audio
|
||||||
|
|
||||||
raise StopIteration
|
raise StopIteration
|
||||||
|
|
||||||
|
|
|
@ -2,7 +2,7 @@ import json
|
||||||
import os
|
import os
|
||||||
import numpy as np
|
import numpy as np
|
||||||
|
|
||||||
from typing import List, Optional, Union, Tuple
|
from typing import List, Optional, Union
|
||||||
from keras import Sequential, losses, optimizers, metrics
|
from keras import Sequential, losses, optimizers, metrics
|
||||||
from keras.layers import Layer
|
from keras.layers import Layer
|
||||||
from keras.models import load_model, Model as _Model
|
from keras.models import load_model, Model as _Model
|
||||||
|
@ -20,10 +20,11 @@ class Model:
|
||||||
model: Optional[_Model] = None, optimizer: Union[str, optimizers.Optimizer] = 'adam',
|
model: Optional[_Model] = None, optimizer: Union[str, optimizers.Optimizer] = 'adam',
|
||||||
loss: Union[str, losses.Loss] = losses.SparseCategoricalCrossentropy(from_logits=True),
|
loss: Union[str, losses.Loss] = losses.SparseCategoricalCrossentropy(from_logits=True),
|
||||||
metrics: List[Union[str, metrics.Metric]] = ('accuracy',),
|
metrics: List[Union[str, metrics.Metric]] = ('accuracy',),
|
||||||
cutoff_frequencies: Tuple[int, int] = (AudioSegment.default_low_freq, AudioSegment.default_high_freq)):
|
low_freq: int = AudioSegment.default_low_freq,
|
||||||
|
high_freq: int = AudioSegment.default_high_freq):
|
||||||
assert layers or model
|
assert layers or model
|
||||||
self.label_names = labels
|
self.label_names = labels
|
||||||
self.cutoff_frequencies = list(map(int, cutoff_frequencies))
|
self.cutoff_frequencies = (int(low_freq), int(high_freq))
|
||||||
|
|
||||||
if layers:
|
if layers:
|
||||||
self._model = Sequential(layers)
|
self._model = Sequential(layers)
|
||||||
|
@ -74,4 +75,4 @@ class Model:
|
||||||
with open(freq_file, 'r') as f:
|
with open(freq_file, 'r') as f:
|
||||||
frequencies = json.load(f)
|
frequencies = json.load(f)
|
||||||
|
|
||||||
return cls(model=model, labels=label_names, cutoff_frequencies=frequencies)
|
return cls(model=model, labels=label_names, low_freq=frequencies[0], high_freq=frequencies[1])
|
||||||
|
|
0
micmon/utils/__init__.py
Normal file
0
micmon/utils/__init__.py
Normal file
117
micmon/utils/datagen.py
Normal file
117
micmon/utils/datagen.py
Normal file
|
@ -0,0 +1,117 @@
|
||||||
|
import argparse
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
from micmon.audio import AudioDirectory, AudioFile, AudioSegment
|
||||||
|
from micmon.dataset import DatasetWriter
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
defaults = {
|
||||||
|
'sample_duration': 2.0,
|
||||||
|
'sample_rate': 44100,
|
||||||
|
'channels': 1,
|
||||||
|
'ffmpeg_bin': 'ffmpeg',
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def create_dataset(audio_dir: str, dataset_dir: str,
|
||||||
|
low_freq: int = AudioSegment.default_low_freq,
|
||||||
|
high_freq: int = AudioSegment.default_high_freq,
|
||||||
|
bins: int = AudioSegment.default_bins,
|
||||||
|
sample_duration: float = defaults['sample_duration'],
|
||||||
|
sample_rate: int = defaults['sample_rate'],
|
||||||
|
channels: int = defaults['channels'],
|
||||||
|
ffmpeg_bin: str = defaults['ffmpeg_bin']):
|
||||||
|
audio_dir = os.path.abspath(os.path.expanduser(audio_dir))
|
||||||
|
dataset_dir = os.path.abspath(os.path.expanduser(dataset_dir))
|
||||||
|
audio_dirs = AudioDirectory.scan(audio_dir)
|
||||||
|
|
||||||
|
for audio_dir in audio_dirs:
|
||||||
|
dataset_file = os.path.join(dataset_dir, os.path.basename(audio_dir.path) + '.npz')
|
||||||
|
logger.info(f'Processing audio sample {audio_dir.path}')
|
||||||
|
|
||||||
|
with AudioFile(audio_dir.audio_file, audio_dir.labels_file,
|
||||||
|
sample_duration=sample_duration, sample_rate=sample_rate, channels=channels,
|
||||||
|
ffmpeg_bin=os.path.expanduser(ffmpeg_bin)) as reader, \
|
||||||
|
DatasetWriter(dataset_file, low_freq=low_freq, high_freq=high_freq, bins=bins) as writer:
|
||||||
|
for sample in reader:
|
||||||
|
writer += sample
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
# noinspection PyTypeChecker
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description='''
|
||||||
|
Tool to create numpy dataset files with audio spectrum data from a set of labelled raw audio files.''',
|
||||||
|
|
||||||
|
epilog='''
|
||||||
|
- audio_dir should contain a list of sub-directories, each of which represents a labelled audio sample.
|
||||||
|
audio_dir should have the following structure:
|
||||||
|
|
||||||
|
audio_dir/
|
||||||
|
-> train_sample_1
|
||||||
|
-> audio.mp3
|
||||||
|
-> labels.json
|
||||||
|
-> train_sample_2
|
||||||
|
-> audio.mp3
|
||||||
|
-> labels.json
|
||||||
|
...
|
||||||
|
|
||||||
|
- labels.json is a key-value JSON file that contains the labels for each audio segment. Example:
|
||||||
|
|
||||||
|
{
|
||||||
|
"00:00": "negative",
|
||||||
|
"02:13": "positive",
|
||||||
|
"04:57": "negative",
|
||||||
|
"15:41": "positive",
|
||||||
|
"18:24": "negative"
|
||||||
|
}
|
||||||
|
|
||||||
|
Each entry indicates that all the audio samples between the specified timestamp and the next entry or
|
||||||
|
the end of the audio file should be applied the specified label.
|
||||||
|
|
||||||
|
- dataset_dir is the directory where the generated labelled spectrum dataset in .npz format will be saved.
|
||||||
|
Each dataset file will be named like its associated audio samples directory.''',
|
||||||
|
|
||||||
|
formatter_class=argparse.RawDescriptionHelpFormatter
|
||||||
|
)
|
||||||
|
|
||||||
|
parser.add_argument('audio_dir', help='Directory containing the raw audio samples directories to be scanned.')
|
||||||
|
parser.add_argument('dataset_dir', help='Destination directory for the compressed .npz files containing the '
|
||||||
|
'frequency spectrum datasets.')
|
||||||
|
parser.add_argument('--low', help='Specify the lowest frequency to be considered in the generated frequency '
|
||||||
|
'spectrum. Default: 20 Hz (lowest possible frequency audible to a human ear).',
|
||||||
|
required=False, default=AudioSegment.default_low_freq, dest='low_freq', type=int)
|
||||||
|
|
||||||
|
parser.add_argument('--high', help='Specify the highest frequency to be considered in the generated frequency '
|
||||||
|
'spectrum. Default: 20 kHz (highest possible frequency audible to a human ear).',
|
||||||
|
required=False, default=AudioSegment.default_high_freq, dest='high_freq', type=int)
|
||||||
|
|
||||||
|
parser.add_argument('-b', '--bins', help=f'Specify the number of frequency bins to be used for the spectrum '
|
||||||
|
f'analysis (default: {AudioSegment.default_bins})',
|
||||||
|
required=False, default=AudioSegment.default_bins, dest='bins', type=int)
|
||||||
|
|
||||||
|
parser.add_argument('-d', '--sample-duration', help=f'The script will calculate the spectrum of audio segments of '
|
||||||
|
f'this specified length in seconds (default: '
|
||||||
|
f'{defaults["sample_duration"]}).',
|
||||||
|
required=False, default=defaults['sample_duration'], dest='sample_duration', type=float)
|
||||||
|
|
||||||
|
parser.add_argument('-r', '--sample-rate', help=f'Audio sample rate (default: {defaults["sample_rate"]} Hz)',
|
||||||
|
required=False, default=defaults['sample_rate'], dest='sample_rate', type=int)
|
||||||
|
|
||||||
|
parser.add_argument('-c', '--channels', help=f'Number of destination audio channels (default: '
|
||||||
|
f'{defaults["channels"]})',
|
||||||
|
required=False, default=defaults['channels'], dest='channels', type=int)
|
||||||
|
|
||||||
|
parser.add_argument('--ffmpeg', help=f'Absolute path to the ffmpeg executable (default: {defaults["ffmpeg_bin"]})',
|
||||||
|
required=False, default=defaults['ffmpeg_bin'], dest='ffmpeg_bin', type=str)
|
||||||
|
|
||||||
|
opts, args = parser.parse_known_args(sys.argv[1:])
|
||||||
|
return create_dataset(audio_dir=opts.audio_dir, dataset_dir=opts.dataset_dir, low_freq=opts.low_freq,
|
||||||
|
high_freq=opts.high_freq, bins=opts.bins, sample_duration=opts.sample_duration,
|
||||||
|
sample_rate=opts.sample_rate, channels=opts.channels, ffmpeg_bin=opts.ffmpeg_bin)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
main()
|
File diff suppressed because one or more lines are too long
|
@ -46,9 +46,7 @@
|
||||||
"from micmon.audio import AudioDevice, AudioPlayer\n",
|
"from micmon.audio import AudioDevice, AudioPlayer\n",
|
||||||
"from micmon.model import Model\n",
|
"from micmon.model import Model\n",
|
||||||
"\n",
|
"\n",
|
||||||
"basedir = os.path.expanduser(os.path.join('~', 'projects', 'baby-monitor'))\n",
|
"model_dir = os.path.expanduser(os.path.join('~', 'models', 'baby-monitor'))\n",
|
||||||
"models_dir = os.path.join(basedir, 'models')\n",
|
|
||||||
"model_path = os.path.join(models_dir, 'baby-monitor')\n",
|
|
||||||
"audio_system = 'alsa'\n",
|
"audio_system = 'alsa'\n",
|
||||||
"audio_device = 'plughw:3,0'\n",
|
"audio_device = 'plughw:3,0'\n",
|
||||||
"label_names = ['negative', 'positive']"
|
"label_names = ['negative', 'positive']"
|
||||||
|
@ -68,7 +66,7 @@
|
||||||
"execution_count": 2,
|
"execution_count": 2,
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"model = Model.load(model_path)"
|
"model = Model.load(model_dir)"
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"collapsed": false,
|
"collapsed": false,
|
||||||
|
@ -122,23 +120,17 @@
|
||||||
"name": "stdout",
|
"name": "stdout",
|
||||||
"output_type": "stream",
|
"output_type": "stream",
|
||||||
"text": [
|
"text": [
|
||||||
|
"negative\n",
|
||||||
|
"negative\n",
|
||||||
|
"negative\n",
|
||||||
|
"negative\n",
|
||||||
|
"negative\n",
|
||||||
|
"negative\n",
|
||||||
"negative\n",
|
"negative\n",
|
||||||
"negative\n",
|
"negative\n",
|
||||||
"negative\n",
|
"negative\n",
|
||||||
"negative\n"
|
"negative\n"
|
||||||
]
|
]
|
||||||
},
|
|
||||||
{
|
|
||||||
"ename": "KeyboardInterrupt",
|
|
||||||
"evalue": "",
|
|
||||||
"output_type": "error",
|
|
||||||
"traceback": [
|
|
||||||
"\u001B[0;31m---------------------------------------------------------------------------\u001B[0m",
|
|
||||||
"\u001B[0;31mKeyboardInterrupt\u001B[0m Traceback (most recent call last)",
|
|
||||||
"\u001B[0;32m<ipython-input-3-27c83a302cb8>\u001B[0m in \u001B[0;36m<module>\u001B[0;34m\u001B[0m\n\u001B[1;32m 1\u001B[0m \u001B[0;32mwith\u001B[0m \u001B[0mAudioDevice\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0maudio_system\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mdevice\u001B[0m\u001B[0;34m=\u001B[0m\u001B[0maudio_device\u001B[0m\u001B[0;34m)\u001B[0m \u001B[0;32mas\u001B[0m \u001B[0msource\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m----> 2\u001B[0;31m \u001B[0;32mfor\u001B[0m \u001B[0msample\u001B[0m \u001B[0;32min\u001B[0m \u001B[0msource\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m 3\u001B[0m \u001B[0msource\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mpause\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 4\u001B[0m \u001B[0mprediction\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0mmodel\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mpredict\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0msample\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 5\u001B[0m \u001B[0mprint\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mprediction\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
||||||
"\u001B[0;32m~/projects/baby-monitor/micmon/audio/source.py\u001B[0m in \u001B[0;36m__next__\u001B[0;34m(self)\u001B[0m\n\u001B[1;32m 36\u001B[0m \u001B[0;32mraise\u001B[0m \u001B[0mStopIteration\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 37\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m---> 38\u001B[0;31m \u001B[0mdata\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mffmpeg\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mstdout\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mread\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mbufsize\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m 39\u001B[0m \u001B[0;32mif\u001B[0m \u001B[0mdata\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 40\u001B[0m \u001B[0;32mreturn\u001B[0m \u001B[0mAudioSegment\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mdata\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0msample_rate\u001B[0m\u001B[0;34m=\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0msample_rate\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mchannels\u001B[0m\u001B[0;34m=\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mchannels\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n",
|
|
||||||
"\u001B[0;31mKeyboardInterrupt\u001B[0m: "
|
|
||||||
]
|
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"source": [
|
"source": [
|
||||||
|
@ -147,7 +139,7 @@
|
||||||
" source.pause()\n",
|
" source.pause()\n",
|
||||||
" prediction = model.predict(sample)\n",
|
" prediction = model.predict(sample)\n",
|
||||||
" print(prediction)\n",
|
" print(prediction)\n",
|
||||||
" source.resume()"
|
" source.resume()\n"
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"collapsed": false,
|
"collapsed": false,
|
||||||
|
|
File diff suppressed because one or more lines are too long
3
requirements.txt
Normal file
3
requirements.txt
Normal file
|
@ -0,0 +1,3 @@
|
||||||
|
numpy
|
||||||
|
tensorflow
|
||||||
|
keras
|
40
setup.py
Executable file
40
setup.py
Executable file
|
@ -0,0 +1,40 @@
|
||||||
|
#!/usr/bin/env python
|
||||||
|
import os
|
||||||
|
|
||||||
|
from setuptools import setup, find_packages
|
||||||
|
|
||||||
|
|
||||||
|
def path(fname=''):
|
||||||
|
return os.path.abspath(os.path.join(os.path.dirname(__file__), fname))
|
||||||
|
|
||||||
|
|
||||||
|
def readfile(fname):
|
||||||
|
with open(path(fname)) as f:
|
||||||
|
return f.read()
|
||||||
|
|
||||||
|
|
||||||
|
setup(
|
||||||
|
name="micmon",
|
||||||
|
version="0.1",
|
||||||
|
author="Fabio Manganiello",
|
||||||
|
author_email="info@fabiomanganiello.com",
|
||||||
|
description="Programmable Tensorflow-based sound/noise detector",
|
||||||
|
license="MIT",
|
||||||
|
python_requires='>= 3.6',
|
||||||
|
keywords="machine-learning tensorflow sound-detection",
|
||||||
|
url="https://github.com/BlackLight/micmon",
|
||||||
|
packages=find_packages(),
|
||||||
|
include_package_data=True,
|
||||||
|
long_description=readfile('README.md'),
|
||||||
|
long_description_content_type='text/markdown',
|
||||||
|
entry_points={
|
||||||
|
'console_scripts': [
|
||||||
|
'micmon-datagen=micmon.utils.datagen:main',
|
||||||
|
],
|
||||||
|
},
|
||||||
|
classifiers=[
|
||||||
|
"Topic :: Utilities",
|
||||||
|
"License :: OSI Approved :: MIT License",
|
||||||
|
"Development Status :: 3 - Alpha",
|
||||||
|
],
|
||||||
|
)
|
Loading…
Reference in a new issue