Sound event detection in real life recordings using coupled matrix factorization of spectral representations and class activity annotations

Methods for detection of overlapping sound events in audio involve matrix factorization approaches, often assigning separated components to event classes. We present a method that bypasses the supervised construction of class models. The method learns the components as a non-negative dictionary in a coupled matrix factorization problem, where the spectral representation and the class activity annotation of the audio signal share the activation matrix. In testing, the dictionaries are used to estimate directly the class activations. For dealing with large amount of training data, two methods are proposed for reducing the size of the dictionary. The methods were tested on a database of real life recordings, and outperformed previous approaches by over 10%.

[1]  Ching-Yung Lin,et al.  Healthcare audio event classification using Hidden Markov Models and Hierarchical Hidden Markov Models , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[2]  Onur Dikmen,et al.  Sound event detection using non-negative dictionaries learned from annotated overlapping events , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[3]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[4]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[5]  Jérôme Idier,et al.  Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.

[6]  Tuomas Virtanen,et al.  Context-dependent sound event detection , 2013, EURASIP Journal on Audio, Speech, and Music Processing.

[7]  Moncef Gabbouj,et al.  Supervised model training for overlapping sound events based on unsupervised source separation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Stavros Ntalampiras,et al.  Detection of human activities in natural environments based on their acoustic emissions , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[9]  Hiroyuki Kasai,et al.  NMF-based environmental sound source separation using time-variant gain features , 2012, Comput. Math. Appl..

[10]  Jérôme Idier,et al.  Algorithms for nonnegative matrix factorization with the beta-divergence , 2010, ArXiv.

[11]  Tuomas Virtanen,et al.  Acoustic event detection in real life recordings , 2010, 2010 18th European Signal Processing Conference.

[12]  Taras Butko,et al.  Two-source acoustic event detection and localization: Online implementation in a Smart-room , 2011, 2011 19th European Signal Processing Conference.

[13]  Andrey Temko,et al.  Acoustic Event Detection: SVM-Based System and Evaluation Setup in CLEAR'07 , 2007, CLEAR.

[14]  Bart Vanrumste,et al.  An exemplar-based NMF approach to audio event detection , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[15]  Haizhou Li,et al.  Sound Event Recognition With Probabilistic Distance SVMs , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Onur Dikmen,et al.  Maximum marginal likelihood estimation for nonnegative dictionary learning , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Ali Taylan Cemgil,et al.  Generalised Coupled Tensor Factorisation , 2011, NIPS.

[18]  Andrey Temko,et al.  Acoustic event detection in meeting-room environments , 2009, Pattern Recognit. Lett..

[19]  Lie Lu,et al.  A flexible framework for key audio effects detection and auditory context inference , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  D. Wang,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2008, IEEE Trans. Neural Networks.

[21]  Dan Stowell,et al.  Detection and classification of acoustic scenes and events: An IEEE AASP challenge , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.