Random Regression Forests for Acoustic Event Detection and Classification

Despite the success of the automatic speech recognition framework in its own application field, its adaptation to the problem of acoustic event detection has resulted in limited success. In this paper, instead of treating the problem similar to the segmentation and classification tasks in speech recognition, we pose it as a regression task and propose an approach based on random forest regression. Furthermore, event localization in time can be efficiently handled as a joint problem. We first decompose the training audio signals into multiple interleaved superframes which are annotated with the corresponding event class labels and their displacements to the temporal onsets and offsets of the events. For a specific event category, a random-forest regression model is learned using the displacement information. Given an unseen superframe, the learned regressor will output the continuous estimates of the onset and offset locations of the events. To deal with multiple event categories, prior to the category-specific regression phase, a superframe-wise recognition phase is performed to reject the background superframes and to classify the event superframes into different event categories. While jointly posing event detection and localization as a regression problem is novel, the superior performance on two databases ITC-Irst and UPC-TALP demonstrates the efficiency and potential of the proposed approach.

[1]  Antonio Criminisi,et al.  Regression forests for efficient anatomy detection and localization in computed tomography scans , 2013, Medical Image Anal..

[2]  Alexander H. Waibel,et al.  Computers in the Human Interaction Loop , 2009, Handbook of Ambient Intelligence and Smart Environments.

[3]  Hanseok Ko,et al.  Acoustic signal based abnormal event detection in indoor environment using multiclass adaboost , 2013, IEEE Transactions on Consumer Electronics.

[4]  Laurent Girin,et al.  Sound representation and classification benchmark for domestic robots , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Richard F. Lyon,et al.  Machine Hearing: An Emerging Field [Exploratory DSP] , 2010, IEEE Signal Processing Magazine.

[6]  Luc Van Gool,et al.  Hough Forests for Object Detection, Tracking, and Action Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Gernot A. Fink,et al.  A Bag-of-Features approach to acoustic event detection , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Alvin Harvey Kam,et al.  An automatic acoustic bathroom monitoring system , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[9]  Ming Liu,et al.  HMM-Based Acoustic Event Detection with AdaBoost Feature Selection , 2007, CLEAR.

[10]  Chng Eng Siong,et al.  Image Feature Representation of the Subband Power Distribution for Robust Sound Event Classification , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Andrey Temko,et al.  CLEAR Evaluation of Acoustic Event Detection and Classification Systems , 2006, CLEAR.

[12]  Alexander H. Waibel CHIL - Computers in the Human Interaction Loop , 2005, MVA.

[13]  S. Rigatti Random Forest. , 2017, Journal of insurance medicine.

[14]  Andrey Temko,et al.  Acoustic Event Detection: SVM-Based System and Evaluation Setup in CLEAR'07 , 2007, CLEAR.

[15]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[16]  Bhiksha Raj,et al.  Audio event detection from acoustic unit occurrence patterns , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Stefan Goetze,et al.  Detection and Classification of Acoustic Events for In-Home Care , 2011 .

[18]  Kai Oliver Arras,et al.  Audio-based human activity recognition using Non-Markovian Ensemble Voting , 2012, 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication.

[19]  Bart Vanrumste,et al.  An exemplar-based NMF approach to audio event detection , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[20]  Chin-Hui Lee,et al.  A blind segmentation approach to acoustic event detection based on i-vector , 2013, INTERSPEECH.

[21]  Daniel P. W. Ellis,et al.  Spectral vs. spectro-temporal features for acoustic event detection , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[22]  Christian A. Müller,et al.  Speech-overlapped acoustic event detection for automotive applications , 2008, INTERSPEECH.

[23]  Taras Butko,et al.  Feature selection for multimodal: acoustic event detection , 2011 .

[24]  Thomas S. Huang,et al.  Feature analysis and selection for acoustic event detection , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  Björn W. Schuller,et al.  Brute-forcing hierarchical functionals for paralinguistics: A waste of feature space? , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  Toni Heittola,et al.  IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events SOUND EVENT DETECTION FOR OFFICE LIVE AND OFFICE SYNTHETIC AASP CHALLENGE , 2015 .

[27]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[28]  Taras Butko,et al.  Acoustic Event Detection Based on Feature-Level Fusion of Audio and Video Modalities , 2011, EURASIP J. Adv. Signal Process..

[29]  Tuomas Virtanen,et al.  Acoustic event detection in real life recordings , 2010, 2010 18th European Signal Processing Conference.

[30]  Climent Nadeu,et al.  On the acoustic environment of a neonatal intensive care unit: initial description, and detection of equipment alarms , 2014, INTERSPEECH.

[31]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[32]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[33]  Radu Horaud,et al.  Sound-event recognition with a companion humanoid , 2012, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012).

[34]  Mark Hasegawa-Johnson,et al.  Improving acoustic event detection using generalizable visual features and multi-modality modeling , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[35]  Dan Stowell,et al.  A database and challenge for acoustic scene classification and event detection , 2013, 21st European Signal Processing Conference (EUSIPCO 2013).

[36]  B. Kollmeier,et al.  Challenge on Detection and Classification of Acoustic Scenes and Events ACOUSTIC EVENT DETECTION USING SIGNAL ENHANCEMENT AND SPECTRO-TEMPORAL FEATURE EXTRACTION , 2013 .

[37]  Andrey Temko,et al.  Acoustic Event Detection and Classification , 2007, Computers in the Human Interaction Loop.

[38]  Thomas S. Huang,et al.  Real-world acoustic event detection , 2010, Pattern Recognit. Lett..

[39]  Richard F. Lyon,et al.  Machine Hearing: An Emerging Field , 2010 .

[40]  Andrey Temko,et al.  Acoustic event detection in meeting-room environments , 2009, Pattern Recognit. Lett..

[41]  Mark Hasegawa-Johnson,et al.  Acoustic fall detection using Gaussian mixture models and GMM supervectors , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[42]  Augusto Sarti,et al.  Scream and gunshot detection and localization for audio-surveillance systems , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[43]  Huy Phan,et al.  A Voting-Based Technique for Acoustic Event-Specific Detection , 2014 .

[44]  Isabel Trancoso,et al.  Detecting audio events for semantic video search , 2009, INTERSPEECH.

[45]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..