since my neighbor has a new girlfriend, who is quite active (viewtopic.php?p=106595#p106595), I've wanted to find a way to continuously record in the hallway.
So I've bought a cheap mini USB recorder on a famous Chinese website. The recording quality is not really good (probably because there's no real mic) but it has two useful properties:
- files are timestamped
- records can be split
- record for ~a day, split by hour
- copy all files to my computer
- use Audacity's "pipe" mode to run macros to "truncate silence" above noise level, resulting files have no silence
- use https://github.com/tyiannak/pyAudioAnalysis's segmentClassifyFile to mark interesting timestamps
- validate results manually
- edit files manually (high/low pass, cuts, denoise)
Machine learning training
pyAudioAnalysis's training was done as follow:
- using https://sonicvisualiser.org/ to label segments
- then splitting using https://github.com/tyiannak/pyAudioAnal ... Extraction featureExtractionFile to generate labeled files
- the loop below to train and test :
Code: Select all
for method in knn svm extratrees gradientboosting randomforest ; do python3 ../pyAudioAnalysis/pyAudioAnalysis/audioAnalysis.py trainClassifier -i labeled/* --method "$method" -o ../models/$method-classifier.out | tee "../models/${method}-train.txt" done
Machine learning file segmentation
Code: Select all
for i in macro-output/*WAV ; do
python3 ../pyAudioAnalysis/pyAudioAnalysis/audioAnalysis.py segmentClassifyFile -i "$i" --model svm --modelName ../models/svm-classifier3.out | grep ': sex' | grep -v -F '0:00:00 - 0:00:01: sex' > "${i}-analysis.txt"
done
1. false positives (scattered short timestamps)
Code: Select all
0:01:11 - 0:01:12: sex
0:02:49 - 0:02:50: sex
0:05:06 - 0:05:07: sex
0:05:26 - 0:05:27: sex
0:07:03 - 0:07:04: sex
0:08:20 - 0:08:21: sex
Code: Select all
0:00:06 - 0:00:08: sex
0:00:10 - 0:00:11: sex
0:00:13 - 0:00:14: sex
0:00:16 - 0:00:18: sex
0:00:21 - 0:00:22: sex
0:01:10 - 0:01:11: sex
0:01:12 - 0:01:13: sex
0:01:15 - 0:01:16: sex
0:01:21 - 0:01:22: sex
0:01:25 - 0:01:26: sex
0:01:30 - 0:01:31: sex
0:01:42 - 0:01:44: sex
0:01:53 - 0:01:54: sex
0:01:57 - 0:01:58: sex
0:02:01 - 0:02:02: sex
0:02:06 - 0:02:07: sex
0:02:10 - 0:02:11: sex
0:02:13 - 0:02:15: sex
0:04:32 - 0:04:33: sex
0:04:46 - 0:04:47: sex
0:04:48 - 0:04:49: sex
0:06:44 - 0:06:45: sex
the training works only on the combination of recorder/neighbor voice I think. It would probably be quite easy to train something more advanced (resnet or sth like that) given enough samples.