CaR-FOREST: Joint Classification-Regression Decision Forests for Overlapping Audio Event Detection

This report describes our submissions to Task2 and Task3 of the DCASE 2016 challenge. The systems aim at dealing with the detection of overlapping audio events in continuous streams, where the detectors are based on random decision forests. The proposed forests are jointly trained for classification and regression simultaneously. Initially, the training is classification-oriented to encourage the trees to select discriminative features from overlapping mixtures to separate positive audio segments from the negative ones. The regression phase is then carried out to let the positive audio segments vote for the event onsets and offsets, and therefore model the temporal structure of audio events. One random decision forest is specifically trained for each event category of interest. Experimental results on the development data show that our systems significantly outperform the baseline on the Task2 evaluation while they are inferior to the baseline in the Task3 evaluation.

[1]  Tuomas Virtanen,et al.  TUT database for acoustic scene classification and sound event detection , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).

[2]  Stefan Goetze,et al.  Detection and Classification of Acoustic Events for In-Home Care , 2011 .

[3]  Guillaume Lemaitre,et al.  Real-Time Detection of Overlapping Sound Events with Non-Negative Matrix Factorization , 2013 .

[4]  Haizhou Li,et al.  Jump Function Kolmogorov for overlapping audio event classification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  D. Wang,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2008, IEEE Trans. Neural Networks.

[6]  Gernot A. Fink,et al.  A Bag-of-Features approach to acoustic event detection , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Guy J. Brown,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .

[8]  Huy Phan,et al.  Learning compact structural representations for audio events using regressor banks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Annamaria Mesaros,et al.  Sound Event Detection in Multisource Environments Using Source Separation , 2011 .

[10]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[11]  Huy Phan,et al.  A multi-channel fusion framework for audio event detection , 2015, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[12]  Augusto Sarti,et al.  Scream and gunshot detection and localization for audio-surveillance systems , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[13]  Huy Phan,et al.  Acoustic event detection and localization with regression forests , 2014, INTERSPEECH.

[14]  Richard F. Lyon,et al.  Machine Hearing: An Emerging Field , 2010 .

[15]  Huy Phan,et al.  Random Regression Forests for Acoustic Event Detection and Classification , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[16]  Heikki Huttunen,et al.  Polyphonic sound event detection using multi label deep neural networks , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[17]  Thomas S. Huang,et al.  Real-world acoustic event detection , 2010, Pattern Recognit. Lett..

[18]  Andrey Temko,et al.  Acoustic event detection in meeting-room environments , 2009, Pattern Recognit. Lett..

[19]  Huy Phan,et al.  Early event detection in audio streams , 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME).

[20]  Tuomas Virtanen,et al.  Acoustic event detection in real life recordings , 2010, 2010 18th European Signal Processing Conference.

[21]  Gernot A. Fink,et al.  Temporal Acoustic Words for Online Acoustic Event Detection , 2015, GCPR.