mirror of
https://github.com/BlackLight/micmon.git
synced 2024-12-26 18:45:11 +01:00
Added proper README and examples
This commit is contained in:
parent
2f578929fb
commit
d867880199
16 changed files with 764 additions and 120 deletions
3
.gitignore
vendored
3
.gitignore
vendored
|
@ -3,3 +3,6 @@
|
|||
/data/
|
||||
/models/
|
||||
__pycache__
|
||||
/build
|
||||
/dist
|
||||
*.egg-info
|
||||
|
|
22
LICENSE.txt
Normal file
22
LICENSE.txt
Normal file
|
@ -0,0 +1,22 @@
|
|||
MIT License
|
||||
|
||||
Copyright (c) 2017, 2020 Fabio Manganiello
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
||||
|
313
README.md
Normal file
313
README.md
Normal file
|
@ -0,0 +1,313 @@
|
|||
micmon
|
||||
======
|
||||
|
||||
*micmon* is a ML-powered library to detect sounds in an audio stream,
|
||||
either from a file or from an audio input. The use case for its development
|
||||
has been the creation of a self-built baby monitor to detect the cries
|
||||
of my new born through a RaspberryPi + USB microphone, but it should be
|
||||
good enough to detect any type of noise or audio if used with a well trained
|
||||
model.
|
||||
|
||||
It works by splitting an audio stream into short segments, it calculates the
|
||||
FFT and spectrum bins for each of these segments, and it uses such spectrum
|
||||
data to train a model to detect the audio. It works well with sounds that are
|
||||
loud enough to stand out of the background (it's good at detecting e.g. the
|
||||
sound of an alarm clock, not the sound of flying mosquitto), that are long
|
||||
enough compared to the size of the chunks (very short sounds will leave a
|
||||
very small trace in the spectrum of an audio chunk) and, even better, if
|
||||
their frequency bandwidth doesn't overlap a lot with other sounds (it's good
|
||||
at detecting the cries of your baby, since his/her voice has a higher pitch
|
||||
than yours, but it may not detect difference in the spectral signature of
|
||||
the voice of two adult men in the same age group). It's not going to perform
|
||||
very well if instead you are trying to use to detect speech - since it operates
|
||||
on time-agnostic frequency data from chunks of audio it's not granular enough
|
||||
for proper speech-to-text applications, and it wouldn't be robust enough to
|
||||
detect differences in voice pitch, tone or accent.
|
||||
|
||||
Dependencies
|
||||
------------
|
||||
|
||||
The software uses *ffmpeg* to record and decode audio - check instructions for
|
||||
your OS on how to get it installed. It also requires *lame* or any other mp3
|
||||
encoder to encode captured audio to mp3.
|
||||
|
||||
Python dependencies:
|
||||
|
||||
```bash
|
||||
pip install numpy tensorflow keras
|
||||
|
||||
# Optional, for graphs
|
||||
pip install matplotlib
|
||||
```
|
||||
|
||||
Installation
|
||||
------------
|
||||
|
||||
```bash
|
||||
git clone https://github.com/BlackLight/micmon
|
||||
cd micmon
|
||||
python setup.py install
|
||||
```
|
||||
|
||||
Audio capture
|
||||
-------------
|
||||
|
||||
Once the software is installed, you can proceed with recording some audio that
|
||||
will be used for training the model. First create a directory for your audio
|
||||
samples dataset:
|
||||
|
||||
```bash
|
||||
# This folder will store our audio samples
|
||||
mkdir -p ~/datasets/sound-detect/audio
|
||||
|
||||
# This folder will store the datasets
|
||||
# generated from the labelled audio samples
|
||||
mkdir -p ~/datasets/sound-detect/data
|
||||
|
||||
# This folder will store the generated
|
||||
# Tensorflow models
|
||||
mkdir -p ~/models
|
||||
|
||||
cd ~/datasets/sound-detect/audio
|
||||
```
|
||||
|
||||
Then create a new sub-folder for your first audio sample and start recording.
|
||||
Example:
|
||||
|
||||
```bash
|
||||
mkdir sample_1
|
||||
cd sample_1
|
||||
arecord -D plughw:0,1 -f cd | lame - audio.mp3
|
||||
```
|
||||
|
||||
In the example above we are using *arecord* to record from the second channel
|
||||
of the first audio device (check a list of available recording devices with
|
||||
*arecord -l*) in WAV format, and we are then using the *lame* encoder to
|
||||
convert the raw audio to mp3. When done with recording, just Ctrl-C the
|
||||
application and your audio file will be ready.
|
||||
|
||||
Audio labelling
|
||||
---------------
|
||||
|
||||
In the same directory as your sample (in the example above it will be
|
||||
`~/datasets/sound-detect/audio/sample_1`) create a new file named
|
||||
`labels.json`. Now open your audio file in Audacity or any audio player
|
||||
and identify the audio segments that match your criteria - for example
|
||||
when your baby is crying, when the alarm starts, when your neighbour
|
||||
starts drilling the wall, or whatever the criteria is. `labels.json`
|
||||
should contain a key-value mapping in the form of `start_time -> label`.
|
||||
Example:
|
||||
|
||||
```json
|
||||
{
|
||||
"00:00": "negative",
|
||||
"02:13": "positive",
|
||||
"04:57": "negative",
|
||||
"15:41": "positive",
|
||||
"18:24": "negative"
|
||||
}
|
||||
```
|
||||
|
||||
In the example above, all the audio segments between 00:00 and 02:12 will
|
||||
be labelled as negative, all the segments between 02:13 and 04:56 as
|
||||
positive, and so on.
|
||||
|
||||
You can now use *micmon* to generate a frequency spectrum dataset out of
|
||||
your labelled audio. You can do it either through the `micmon-datagen`
|
||||
script or with your own script.
|
||||
|
||||
### micmon-datagen
|
||||
|
||||
Type `micmon-datagen --help` to get a full list of the available options.
|
||||
In general, `micmon-datagen` requires a directory that contains the labelled
|
||||
audio samples sub-directories as input and a directory where the calculated
|
||||
numpy-compressed datasets will be stored. If you want to generate the dataset
|
||||
for the audio samples captured on the previous iteration then the command
|
||||
will be something like this:
|
||||
|
||||
```bash
|
||||
micmon-datagen --low 250 --high 7500 --bins 100 --sample-duration 2 --channels 1 \
|
||||
~/datasets/sound-detect/audio ~/models
|
||||
```
|
||||
|
||||
The `--low` and `--high` options respectively identify the lowest and highest
|
||||
frequencies that should be taken into account in the output spectrum. By default
|
||||
these values are 20 Hz and 20 kHz (respectively the lowest and highest frequency
|
||||
audible to a healthy and young human ear), but you can narrow down the frequency
|
||||
space to only detect the frequencies that you're interested in and to remove
|
||||
high-frequency harmonics that may spoil your data. A good way to estimate the
|
||||
frequency space is to use e.g. Audacity or any audio equalizer to select the
|
||||
segments of your audio that contain the sounds that you want to detect and
|
||||
check their dominant frequencies - you definitely want those frequencies to be
|
||||
included in your range.
|
||||
|
||||
`--bins` specifies in how many segments/buckets the frequency spectrum should
|
||||
be split - 100 bins is the default value. `--sample-duration` specifies the
|
||||
duration in seconds for each spectrum data point - 2 seconds is the default
|
||||
value, i.e. the audio samples will be read in chunks of 2 seconds each and the
|
||||
spectrum will be calculated for each of these chunks. If the sounds you want to
|
||||
detect are shorter then you may want to reduce this value.
|
||||
|
||||
### Generate the dataset via script
|
||||
|
||||
The other way to generate the dataset from the audio is through the *micmon* API
|
||||
itself. This option also enables you to take a peek at the audio data to better
|
||||
calibrate the parameters. For example:
|
||||
|
||||
```python
|
||||
import os
|
||||
|
||||
from micmon.audio import AudioDirectory, AudioPlayer, AudioFile
|
||||
from micmon.dataset import DatasetWriter
|
||||
|
||||
basedir = os.path.expanduser('~/datasets/sound-detect')
|
||||
audio_dir = os.path.join(basedir, 'audio/sample_1')
|
||||
datasets_dir = os.path.join(basedir, 'data')
|
||||
cutoff_frequencies = [250, 7500]
|
||||
|
||||
# Scan the base audio_dir for labelled audio samples
|
||||
audio_dirs = AudioDirectory.scan(audio_dir)
|
||||
|
||||
# Play some audio samples starting from 01:00
|
||||
for audio_dir in audio_dirs:
|
||||
with AudioFile(audio_dir, start='01:00', duration=5) as reader, \
|
||||
AudioPlayer() as player:
|
||||
for sample in reader:
|
||||
player.play(sample)
|
||||
|
||||
# Plot the audio and spectrum of the audio samples in the first 10 seconds
|
||||
# of each audio file.
|
||||
for audio_dir in audio_dirs:
|
||||
with AudioFile(audio_dir, start=0, duration=10) as reader:
|
||||
for sample in reader:
|
||||
sample.plot_audio()
|
||||
sample.plot_spectrum(low_freq=cutoff_frequencies[0],
|
||||
high_freq=cutoff_frequencies[1])
|
||||
|
||||
# Save the spectrum information and labels of the samples to a
|
||||
# different compressed file for each audio file.
|
||||
for audio_dir in audio_dirs:
|
||||
dataset_file = os.path.join(datasets_dir, os.path.basename(audio_dir.path) + '.npz')
|
||||
print(f'Processing audio sample {audio_dir.path}')
|
||||
|
||||
with AudioFile(audio_dir) as reader, \
|
||||
DatasetWriter(dataset_file,
|
||||
low_freq=cutoff_frequencies[0],
|
||||
high_freq=cutoff_frequencies[1]) as writer:
|
||||
for sample in reader:
|
||||
writer += sample
|
||||
|
||||
```
|
||||
|
||||
Training the model
|
||||
------------------
|
||||
|
||||
Once you have some `.npz` datasets saved under `~/datasets/sound-detect/data`, you can
|
||||
use those datasets to train a Tensorflow+Keras model to classify an audio segment. A full
|
||||
example is available under `examples/train.py`:
|
||||
|
||||
```python
|
||||
import os
|
||||
from keras import layers
|
||||
|
||||
from micmon.dataset import Dataset
|
||||
from micmon.model import Model
|
||||
|
||||
# This is a directory that contains the saved .npz dataset files
|
||||
datasets_dir = os.path.expanduser('~/datasets/sound-detect/data')
|
||||
|
||||
# This is the output directory where the model will be saved
|
||||
model_dir = os.path.expanduser('~/models/sound-detect')
|
||||
|
||||
# This is the number of training epochs for each dataset sample
|
||||
epochs = 2
|
||||
|
||||
# Load the datasets from the compressed files.
|
||||
# 70% of the data points will be included in the training set,
|
||||
# 30% of the data points will be included in the evaluation set
|
||||
# and used to evaluate the performance of the model.
|
||||
datasets = Dataset.scan(datasets_dir, validation_split=0.3)
|
||||
labels = ['negative', 'positive']
|
||||
freq_bins = len(datasets[0].samples[0])
|
||||
|
||||
# Create a network with 4 layers (one input layer, two intermediate layers and one output layer).
|
||||
# The first intermediate layer in this example will have twice the number of units as the number
|
||||
# of input units, while the second intermediate layer will have 75% of the number of
|
||||
# input units. We also specify the names for the labels and the low and high frequency range
|
||||
# used when sampling.
|
||||
model = Model(
|
||||
[
|
||||
layers.Input(shape=(freq_bins,)),
|
||||
layers.Dense(int(2 * freq_bins), activation='relu'),
|
||||
layers.Dense(int(0.75 * freq_bins), activation='relu'),
|
||||
layers.Dense(len(labels), activation='softmax'),
|
||||
],
|
||||
labels=labels,
|
||||
low_freq=datasets[0].low_freq,
|
||||
high_freq=datasets[0].high_freq
|
||||
)
|
||||
|
||||
# Train the model
|
||||
for epoch in range(epochs):
|
||||
for i, dataset in enumerate(datasets):
|
||||
print(f'[epoch {epoch+1}/{epochs}] [audio sample {i+1}/{len(datasets)}]')
|
||||
model.fit(dataset)
|
||||
evaluation = model.evaluate(dataset)
|
||||
print(f'Validation set loss and accuracy: {evaluation}')
|
||||
|
||||
# Save the model
|
||||
model.save(model_dir, overwrite=True)
|
||||
```
|
||||
|
||||
At the end of the process you should find your Tensorflow model saved under `~/models/sound-detect`.
|
||||
You can use it in your scripts to classify audio samples from audio sources.
|
||||
|
||||
Classifying audio samples
|
||||
-------------------------
|
||||
|
||||
One use case is to analyze an audio file and use the model to detect specific sounds. Example:
|
||||
|
||||
```python
|
||||
import os
|
||||
|
||||
from micmon.audio import AudioFile
|
||||
from micmon.model import Model
|
||||
|
||||
model_dir = os.path.expanduser('~/models/sound-detect')
|
||||
model = Model.load(model_dir)
|
||||
cur_seconds = 60
|
||||
sample_duration = 2
|
||||
|
||||
with AudioFile('/path/to/some/audio.mp3',
|
||||
start=cur_seconds, duration='10:00',
|
||||
sample_duration=sample_duration) as reader:
|
||||
for sample in reader:
|
||||
prediction = model.predict(sample)
|
||||
print(f'Audio segment at {cur_seconds} seconds: {prediction}')
|
||||
cur_seconds += sample_duration
|
||||
```
|
||||
|
||||
Another is to analyze live audio samples imported from an audio device - e.g. a USB microphone.
|
||||
Example:
|
||||
|
||||
```python
|
||||
import os
|
||||
|
||||
from micmon.audio import AudioDevice
|
||||
from micmon.model import Model
|
||||
|
||||
model_dir = os.path.expanduser('~/models/sound-detect')
|
||||
model = Model.load(model_dir)
|
||||
audio_system = 'alsa' # Supported: alsa and pulse
|
||||
audio_device = 'plughw:1,0' # Get list of recognized input devices with arecord -l
|
||||
|
||||
with AudioDevice(audio_system, device=audio_device) as source:
|
||||
for sample in source:
|
||||
source.pause() # Pause recording while we process the frame
|
||||
prediction = model.predict(sample)
|
||||
print(prediction)
|
||||
source.resume() # Resume recording
|
||||
```
|
||||
|
||||
You can use these two examples as blueprints to set up your own automation routines
|
||||
with sound detection.
|
16
examples/predict_from_audio_file.py
Normal file
16
examples/predict_from_audio_file.py
Normal file
|
@ -0,0 +1,16 @@
|
|||
import os
|
||||
|
||||
from micmon.audio import AudioFile
|
||||
from micmon.model import Model
|
||||
|
||||
model_dir = os.path.expanduser('~/models/sound-detect')
|
||||
model = Model.load(model_dir)
|
||||
cur_seconds = 60
|
||||
sample_duration = 2
|
||||
|
||||
with AudioFile('/path/to/some/audio.mp3', start=cur_seconds, duration='10:00',
|
||||
sample_duration=sample_duration) as reader:
|
||||
for sample in reader:
|
||||
prediction = model.predict(sample)
|
||||
print(f'Audio segment at {cur_seconds} seconds: {prediction}')
|
||||
cur_seconds += sample_duration
|
18
examples/predict_from_microphone.py
Normal file
18
examples/predict_from_microphone.py
Normal file
|
@ -0,0 +1,18 @@
|
|||
import os
|
||||
|
||||
from micmon.audio import AudioDevice
|
||||
from micmon.model import Model
|
||||
|
||||
# Path to a previously saved sound detection Tensorflow model
|
||||
model_dir = os.path.expanduser('~/models/sound-detect')
|
||||
model = Model.load(model_dir)
|
||||
|
||||
audio_system = 'alsa' # Supported: alsa and pulse
|
||||
audio_device = 'plughw:1,0' # Get list of recognized input devices with arecord -l
|
||||
|
||||
with AudioDevice(audio_system, device=audio_device) as source:
|
||||
for sample in source:
|
||||
source.pause() # Pause recording while we process the frame
|
||||
prediction = model.predict(sample)
|
||||
print(prediction)
|
||||
source.resume() # Resume recording
|
54
examples/train.py
Normal file
54
examples/train.py
Normal file
|
@ -0,0 +1,54 @@
|
|||
# This script shows how to train a neural network to detect sounds given a training set of collected frequency
|
||||
# spectrum data.
|
||||
|
||||
import os
|
||||
from keras import layers
|
||||
|
||||
from micmon.dataset import Dataset
|
||||
from micmon.model import Model
|
||||
|
||||
# This is a directory that contains the saved .npz dataset files
|
||||
datasets_dir = os.path.expanduser('~/datasets/baby-monitor/datasets')
|
||||
|
||||
# This is the output directory where the model will be saved
|
||||
model_dir = os.path.expanduser(os.path.join('~', 'models', 'baby-monitor'))
|
||||
|
||||
# This is the number of training epochs for each dataset sample
|
||||
epochs = 2
|
||||
|
||||
# This value establishes the share of the dataset to be used for cross-validation
|
||||
validation_split = 0.3
|
||||
|
||||
# Load the datasets from the compressed files
|
||||
datasets = Dataset.scan(datasets_dir, validation_split=0.3)
|
||||
|
||||
# Get the number of frequency bins
|
||||
freq_bins = len(datasets[0].samples[0])
|
||||
|
||||
# Create a network with 4 layers (one input layer, two intermediate layers and one output layer).
|
||||
# The first intermediate layer in this example will have twice the number of units as the number
|
||||
# of input units, while the second intermediate layer will have as many units as the number of
|
||||
# input units. We also specify the names for the labels and the low and high frequency range
|
||||
# used when sampling.
|
||||
model = Model(
|
||||
[
|
||||
layers.Input(shape=(freq_bins,)),
|
||||
layers.Dense(int(2.0 * freq_bins), activation='relu'),
|
||||
layers.Dense(int(freq_bins), activation='relu'),
|
||||
layers.Dense(len(datasets[0].labels), activation='softmax'),
|
||||
],
|
||||
labels=['negative', 'positive'],
|
||||
low_freq=datasets[0].low_freq,
|
||||
high_freq=datasets[0].high_freq,
|
||||
)
|
||||
|
||||
# Train the model
|
||||
for epoch in range(epochs):
|
||||
for i, dataset in enumerate(datasets):
|
||||
print(f'[epoch {epoch+1}/{epochs}] [audio sample {i+1}/{len(datasets)}]')
|
||||
model.fit(dataset)
|
||||
evaluation = model.evaluate(dataset)
|
||||
print(f'Validation set loss and accuracy: {evaluation}')
|
||||
|
||||
# Save the model
|
||||
model.save(model_dir, overwrite=True)
|
|
@ -1,4 +1,4 @@
|
|||
import logging
|
||||
import sys
|
||||
|
||||
logging.basicConfig(level=logging.DEBUG, stream=sys.stdout)
|
||||
logging.basicConfig(level=logging.INFO, stream=sys.stdout)
|
||||
|
|
|
@ -1,24 +1,40 @@
|
|||
import json
|
||||
import os
|
||||
import pathlib
|
||||
from typing import Optional, List, Tuple, Union
|
||||
|
||||
from micmon.audio import AudioDirectory, AudioSegment, AudioSource
|
||||
from micmon.audio import AudioSegment, AudioSource, AudioDirectory
|
||||
|
||||
|
||||
class AudioFile(AudioSource):
|
||||
def __init__(self, path: AudioDirectory,
|
||||
def __init__(self,
|
||||
audio_file: Union[str, AudioDirectory],
|
||||
labels_file: Optional[str] = None,
|
||||
start: Union[str, int, float] = 0,
|
||||
duration: Optional[Union[str, int, float]] = None,
|
||||
*args, **kwargs):
|
||||
super().__init__(*args, **kwargs)
|
||||
if isinstance(audio_file, AudioDirectory):
|
||||
audio_file = audio_file.audio_file
|
||||
labels_file = audio_file.labels_file
|
||||
|
||||
self.audio_file = os.path.abspath(os.path.expanduser(audio_file))
|
||||
|
||||
if not labels_file:
|
||||
labels_file = os.path.join(pathlib.Path(self.audio_file).parent, 'labels.json')
|
||||
if not os.path.isfile(labels_file):
|
||||
labels_file = None
|
||||
|
||||
self.labels_file = os.path.abspath(os.path.expanduser(labels_file)) if labels_file else None
|
||||
self.ffmpeg_args = (
|
||||
self.ffmpeg_bin, '-i', path.audio_file, *(('-ss', str(start)) if start else ()),
|
||||
self.ffmpeg_bin, '-i', audio_file, *(('-ss', str(start)) if start else ()),
|
||||
*(('-t', str(duration)) if duration else ()), *self.ffmpeg_base_args
|
||||
)
|
||||
|
||||
self.start = self.convert_time(start)/1000
|
||||
self.duration = self.convert_time(duration)/1000
|
||||
self.segments = self.parse_labels_file(path.labels_file) \
|
||||
if path.labels_file else []
|
||||
self.segments = self.parse_labels_file(labels_file) \
|
||||
if labels_file else []
|
||||
|
||||
self.labels = sorted(list(set(label for timestamp, label in self.segments)))
|
||||
self.cur_time = self.start
|
||||
|
@ -53,4 +69,3 @@ class AudioFile(AudioSource):
|
|||
return audio
|
||||
|
||||
raise StopIteration
|
||||
|
||||
|
|
|
@ -2,7 +2,7 @@ import json
|
|||
import os
|
||||
import numpy as np
|
||||
|
||||
from typing import List, Optional, Union, Tuple
|
||||
from typing import List, Optional, Union
|
||||
from keras import Sequential, losses, optimizers, metrics
|
||||
from keras.layers import Layer
|
||||
from keras.models import load_model, Model as _Model
|
||||
|
@ -20,10 +20,11 @@ class Model:
|
|||
model: Optional[_Model] = None, optimizer: Union[str, optimizers.Optimizer] = 'adam',
|
||||
loss: Union[str, losses.Loss] = losses.SparseCategoricalCrossentropy(from_logits=True),
|
||||
metrics: List[Union[str, metrics.Metric]] = ('accuracy',),
|
||||
cutoff_frequencies: Tuple[int, int] = (AudioSegment.default_low_freq, AudioSegment.default_high_freq)):
|
||||
low_freq: int = AudioSegment.default_low_freq,
|
||||
high_freq: int = AudioSegment.default_high_freq):
|
||||
assert layers or model
|
||||
self.label_names = labels
|
||||
self.cutoff_frequencies = list(map(int, cutoff_frequencies))
|
||||
self.cutoff_frequencies = (int(low_freq), int(high_freq))
|
||||
|
||||
if layers:
|
||||
self._model = Sequential(layers)
|
||||
|
@ -74,4 +75,4 @@ class Model:
|
|||
with open(freq_file, 'r') as f:
|
||||
frequencies = json.load(f)
|
||||
|
||||
return cls(model=model, labels=label_names, cutoff_frequencies=frequencies)
|
||||
return cls(model=model, labels=label_names, low_freq=frequencies[0], high_freq=frequencies[1])
|
||||
|
|
0
micmon/utils/__init__.py
Normal file
0
micmon/utils/__init__.py
Normal file
117
micmon/utils/datagen.py
Normal file
117
micmon/utils/datagen.py
Normal file
|
@ -0,0 +1,117 @@
|
|||
import argparse
|
||||
import logging
|
||||
import os
|
||||
import sys
|
||||
|
||||
from micmon.audio import AudioDirectory, AudioFile, AudioSegment
|
||||
from micmon.dataset import DatasetWriter
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
defaults = {
|
||||
'sample_duration': 2.0,
|
||||
'sample_rate': 44100,
|
||||
'channels': 1,
|
||||
'ffmpeg_bin': 'ffmpeg',
|
||||
}
|
||||
|
||||
|
||||
def create_dataset(audio_dir: str, dataset_dir: str,
|
||||
low_freq: int = AudioSegment.default_low_freq,
|
||||
high_freq: int = AudioSegment.default_high_freq,
|
||||
bins: int = AudioSegment.default_bins,
|
||||
sample_duration: float = defaults['sample_duration'],
|
||||
sample_rate: int = defaults['sample_rate'],
|
||||
channels: int = defaults['channels'],
|
||||
ffmpeg_bin: str = defaults['ffmpeg_bin']):
|
||||
audio_dir = os.path.abspath(os.path.expanduser(audio_dir))
|
||||
dataset_dir = os.path.abspath(os.path.expanduser(dataset_dir))
|
||||
audio_dirs = AudioDirectory.scan(audio_dir)
|
||||
|
||||
for audio_dir in audio_dirs:
|
||||
dataset_file = os.path.join(dataset_dir, os.path.basename(audio_dir.path) + '.npz')
|
||||
logger.info(f'Processing audio sample {audio_dir.path}')
|
||||
|
||||
with AudioFile(audio_dir.audio_file, audio_dir.labels_file,
|
||||
sample_duration=sample_duration, sample_rate=sample_rate, channels=channels,
|
||||
ffmpeg_bin=os.path.expanduser(ffmpeg_bin)) as reader, \
|
||||
DatasetWriter(dataset_file, low_freq=low_freq, high_freq=high_freq, bins=bins) as writer:
|
||||
for sample in reader:
|
||||
writer += sample
|
||||
|
||||
|
||||
def main():
|
||||
# noinspection PyTypeChecker
|
||||
parser = argparse.ArgumentParser(
|
||||
description='''
|
||||
Tool to create numpy dataset files with audio spectrum data from a set of labelled raw audio files.''',
|
||||
|
||||
epilog='''
|
||||
- audio_dir should contain a list of sub-directories, each of which represents a labelled audio sample.
|
||||
audio_dir should have the following structure:
|
||||
|
||||
audio_dir/
|
||||
-> train_sample_1
|
||||
-> audio.mp3
|
||||
-> labels.json
|
||||
-> train_sample_2
|
||||
-> audio.mp3
|
||||
-> labels.json
|
||||
...
|
||||
|
||||
- labels.json is a key-value JSON file that contains the labels for each audio segment. Example:
|
||||
|
||||
{
|
||||
"00:00": "negative",
|
||||
"02:13": "positive",
|
||||
"04:57": "negative",
|
||||
"15:41": "positive",
|
||||
"18:24": "negative"
|
||||
}
|
||||
|
||||
Each entry indicates that all the audio samples between the specified timestamp and the next entry or
|
||||
the end of the audio file should be applied the specified label.
|
||||
|
||||
- dataset_dir is the directory where the generated labelled spectrum dataset in .npz format will be saved.
|
||||
Each dataset file will be named like its associated audio samples directory.''',
|
||||
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter
|
||||
)
|
||||
|
||||
parser.add_argument('audio_dir', help='Directory containing the raw audio samples directories to be scanned.')
|
||||
parser.add_argument('dataset_dir', help='Destination directory for the compressed .npz files containing the '
|
||||
'frequency spectrum datasets.')
|
||||
parser.add_argument('--low', help='Specify the lowest frequency to be considered in the generated frequency '
|
||||
'spectrum. Default: 20 Hz (lowest possible frequency audible to a human ear).',
|
||||
required=False, default=AudioSegment.default_low_freq, dest='low_freq', type=int)
|
||||
|
||||
parser.add_argument('--high', help='Specify the highest frequency to be considered in the generated frequency '
|
||||
'spectrum. Default: 20 kHz (highest possible frequency audible to a human ear).',
|
||||
required=False, default=AudioSegment.default_high_freq, dest='high_freq', type=int)
|
||||
|
||||
parser.add_argument('-b', '--bins', help=f'Specify the number of frequency bins to be used for the spectrum '
|
||||
f'analysis (default: {AudioSegment.default_bins})',
|
||||
required=False, default=AudioSegment.default_bins, dest='bins', type=int)
|
||||
|
||||
parser.add_argument('-d', '--sample-duration', help=f'The script will calculate the spectrum of audio segments of '
|
||||
f'this specified length in seconds (default: '
|
||||
f'{defaults["sample_duration"]}).',
|
||||
required=False, default=defaults['sample_duration'], dest='sample_duration', type=float)
|
||||
|
||||
parser.add_argument('-r', '--sample-rate', help=f'Audio sample rate (default: {defaults["sample_rate"]} Hz)',
|
||||
required=False, default=defaults['sample_rate'], dest='sample_rate', type=int)
|
||||
|
||||
parser.add_argument('-c', '--channels', help=f'Number of destination audio channels (default: '
|
||||
f'{defaults["channels"]})',
|
||||
required=False, default=defaults['channels'], dest='channels', type=int)
|
||||
|
||||
parser.add_argument('--ffmpeg', help=f'Absolute path to the ffmpeg executable (default: {defaults["ffmpeg_bin"]})',
|
||||
required=False, default=defaults['ffmpeg_bin'], dest='ffmpeg_bin', type=str)
|
||||
|
||||
opts, args = parser.parse_known_args(sys.argv[1:])
|
||||
return create_dataset(audio_dir=opts.audio_dir, dataset_dir=opts.dataset_dir, low_freq=opts.low_freq,
|
||||
high_freq=opts.high_freq, bins=opts.bins, sample_duration=opts.sample_duration,
|
||||
sample_rate=opts.sample_rate, channels=opts.channels, ffmpeg_bin=opts.ffmpeg_bin)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
File diff suppressed because one or more lines are too long
|
@ -46,9 +46,7 @@
|
|||
"from micmon.audio import AudioDevice, AudioPlayer\n",
|
||||
"from micmon.model import Model\n",
|
||||
"\n",
|
||||
"basedir = os.path.expanduser(os.path.join('~', 'projects', 'baby-monitor'))\n",
|
||||
"models_dir = os.path.join(basedir, 'models')\n",
|
||||
"model_path = os.path.join(models_dir, 'baby-monitor')\n",
|
||||
"model_dir = os.path.expanduser(os.path.join('~', 'models', 'baby-monitor'))\n",
|
||||
"audio_system = 'alsa'\n",
|
||||
"audio_device = 'plughw:3,0'\n",
|
||||
"label_names = ['negative', 'positive']"
|
||||
|
@ -68,7 +66,7 @@
|
|||
"execution_count": 2,
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"model = Model.load(model_path)"
|
||||
"model = Model.load(model_dir)"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
|
@ -122,23 +120,17 @@
|
|||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"negative\n",
|
||||
"negative\n",
|
||||
"negative\n",
|
||||
"negative\n",
|
||||
"negative\n",
|
||||
"negative\n",
|
||||
"negative\n",
|
||||
"negative\n",
|
||||
"negative\n",
|
||||
"negative\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"ename": "KeyboardInterrupt",
|
||||
"evalue": "",
|
||||
"output_type": "error",
|
||||
"traceback": [
|
||||
"\u001B[0;31m---------------------------------------------------------------------------\u001B[0m",
|
||||
"\u001B[0;31mKeyboardInterrupt\u001B[0m Traceback (most recent call last)",
|
||||
"\u001B[0;32m<ipython-input-3-27c83a302cb8>\u001B[0m in \u001B[0;36m<module>\u001B[0;34m\u001B[0m\n\u001B[1;32m 1\u001B[0m \u001B[0;32mwith\u001B[0m \u001B[0mAudioDevice\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0maudio_system\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mdevice\u001B[0m\u001B[0;34m=\u001B[0m\u001B[0maudio_device\u001B[0m\u001B[0;34m)\u001B[0m \u001B[0;32mas\u001B[0m \u001B[0msource\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m----> 2\u001B[0;31m \u001B[0;32mfor\u001B[0m \u001B[0msample\u001B[0m \u001B[0;32min\u001B[0m \u001B[0msource\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m 3\u001B[0m \u001B[0msource\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mpause\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 4\u001B[0m \u001B[0mprediction\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0mmodel\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mpredict\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0msample\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 5\u001B[0m \u001B[0mprint\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mprediction\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n",
|
||||
"\u001B[0;32m~/projects/baby-monitor/micmon/audio/source.py\u001B[0m in \u001B[0;36m__next__\u001B[0;34m(self)\u001B[0m\n\u001B[1;32m 36\u001B[0m \u001B[0;32mraise\u001B[0m \u001B[0mStopIteration\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 37\u001B[0m \u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0;32m---> 38\u001B[0;31m \u001B[0mdata\u001B[0m \u001B[0;34m=\u001B[0m \u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mffmpeg\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mstdout\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mread\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mbufsize\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[0m\u001B[1;32m 39\u001B[0m \u001B[0;32mif\u001B[0m \u001B[0mdata\u001B[0m\u001B[0;34m:\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n\u001B[1;32m 40\u001B[0m \u001B[0;32mreturn\u001B[0m \u001B[0mAudioSegment\u001B[0m\u001B[0;34m(\u001B[0m\u001B[0mdata\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0msample_rate\u001B[0m\u001B[0;34m=\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0msample_rate\u001B[0m\u001B[0;34m,\u001B[0m \u001B[0mchannels\u001B[0m\u001B[0;34m=\u001B[0m\u001B[0mself\u001B[0m\u001B[0;34m.\u001B[0m\u001B[0mchannels\u001B[0m\u001B[0;34m)\u001B[0m\u001B[0;34m\u001B[0m\u001B[0;34m\u001B[0m\u001B[0m\n",
|
||||
"\u001B[0;31mKeyboardInterrupt\u001B[0m: "
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
|
@ -147,7 +139,7 @@
|
|||
" source.pause()\n",
|
||||
" prediction = model.predict(sample)\n",
|
||||
" print(prediction)\n",
|
||||
" source.resume()"
|
||||
" source.resume()\n"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
|
|
File diff suppressed because one or more lines are too long
3
requirements.txt
Normal file
3
requirements.txt
Normal file
|
@ -0,0 +1,3 @@
|
|||
numpy
|
||||
tensorflow
|
||||
keras
|
40
setup.py
Executable file
40
setup.py
Executable file
|
@ -0,0 +1,40 @@
|
|||
#!/usr/bin/env python
|
||||
import os
|
||||
|
||||
from setuptools import setup, find_packages
|
||||
|
||||
|
||||
def path(fname=''):
|
||||
return os.path.abspath(os.path.join(os.path.dirname(__file__), fname))
|
||||
|
||||
|
||||
def readfile(fname):
|
||||
with open(path(fname)) as f:
|
||||
return f.read()
|
||||
|
||||
|
||||
setup(
|
||||
name="micmon",
|
||||
version="0.1",
|
||||
author="Fabio Manganiello",
|
||||
author_email="info@fabiomanganiello.com",
|
||||
description="Programmable Tensorflow-based sound/noise detector",
|
||||
license="MIT",
|
||||
python_requires='>= 3.6',
|
||||
keywords="machine-learning tensorflow sound-detection",
|
||||
url="https://github.com/BlackLight/micmon",
|
||||
packages=find_packages(),
|
||||
include_package_data=True,
|
||||
long_description=readfile('README.md'),
|
||||
long_description_content_type='text/markdown',
|
||||
entry_points={
|
||||
'console_scripts': [
|
||||
'micmon-datagen=micmon.utils.datagen:main',
|
||||
],
|
||||
},
|
||||
classifiers=[
|
||||
"Topic :: Utilities",
|
||||
"License :: OSI Approved :: MIT License",
|
||||
"Development Status :: 3 - Alpha",
|
||||
],
|
||||
)
|
Loading…
Reference in a new issue