Acoustic Event Classification using spectral band selection and Non-Negative Matrix Factorization-based features

We propose a new front-end for Acoustic Event Classification tasks (AEC).It consists of two stages: short-time feature extraction and temporal integration.The first module relies on mutual information-based frequency band selection.The second module is based on Non-Negative Matrix Factorization (NMF).Results show that it outperforms the baseline system in clean and noisy conditions. Feature extraction methods for sound events have been traditionally based on parametric representations specifically developed for speech signals, such as the well-known Mel Frequency Cepstrum Coefficients (MFCC). However, the discrimination capabilities of these features for Acoustic Event Classification (AEC) tasks could be enhanced by taking into account the spectro-temporal structure of acoustic event signals. In this paper, a new front-end for AEC which incorporates this specific information is proposed. It consists of two different stages: short-time feature extraction and temporal feature integration. The first module aims at providing a better spectral representation of the different acoustic events on a frame-by-frame basis, by means of the automatic selection of the optimal set of frequency bands from which cepstral-like features are extracted. The second stage is designed for capturing the most relevant temporal information in the short-time features, through the application of Non-Negative Matrix Factorization (NMF) on their periodograms computed over long audio segments. The whole front-end has been evaluated in clean and noisy conditions. Experiments show that the removal of certain frequency bands (which are mainly located in the medium region of the spectrum for clean conditions and in low frequencies for noisy environments) in the short-time feature computation process in conjunction with the NMF technique for temporal feature integration improves significantly the performance of a Support Vector Machine (SVM) based AEC system with respect to the use of conventional MFCCs.

[1]  Taras Butko,et al.  On Enhancing Acoustic Event Detection by Using Feature Selection and Audiovisual Feature-Level Fusion , 2010, 2010 Workshops on Database and Expert Systems Applications.

[2]  Lars Kai Hansen,et al.  Optimal filtering of dynamics in short-time features for music organization , 2006, ISMIR.

[3]  Gianluca Bontempi,et al.  On the Use of Variable Complementarity for Feature Selection in Cancer Classification , 2006, EvoWorkshops.

[4]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[5]  Masakiyo Fujimoto,et al.  Spectrogram patch based acoustic event detection and classification in speech overlapping conditions , 2014, 2014 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA).

[6]  João Paulo da Silva Neto,et al.  Non-speech audio event detection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Tushar Sandhan,et al.  Audio Bank: A high-level acoustic signal representation for audio event recognition , 2014, 2014 14th International Conference on Control, Automation and Systems (ICCAS 2014).

[8]  C.-C. Jay Kuo,et al.  Where am I? Scene Recognition for Mobile Robots using Audio Features , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[9]  Francisco J. Valverde-Albacete,et al.  Feature Extraction Assessment for an Acoustic-Event Classification Task Using the Entropy Triangle , 2011, INTERSPEECH.

[10]  Richard Nock,et al.  A hybrid filter/wrapper approach of feature selection using information theory , 2002, Pattern Recognit..

[11]  Hanseok Ko,et al.  Hierarchical approach for abnormal acoustic event classification in an elevator , 2011, 2011 8th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[12]  Christian A. Müller,et al.  Speech-overlapped acoustic event detection for automotive applications , 2008, INTERSPEECH.

[13]  Thomas S. Huang,et al.  Feature analysis and selection for acoustic event detection , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Sergios Theodoridis,et al.  Multimodal and ontology-based fusion approaches of audio and visual processing for violence detection in movies , 2011, Expert Syst. Appl..

[15]  Björn W. Schuller,et al.  Non-negative matrix factorization as noise-robust feature extractor for speech recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Björn W. Schuller,et al.  Large-scale audio feature extraction and SVM for acoustic scene classification , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[17]  Andrey Temko,et al.  Classification of acoustic events using SVM-based clustering schemes , 2006, Pattern Recognit..

[18]  Christian Zieger,et al.  An HMM Based System for Acoustic Event Detection , 2007, CLEAR.

[19]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Zvi Kons,et al.  Audio event classification using deep neural networks , 2013, INTERSPEECH.

[21]  Douglas D. O'Shaughnessy,et al.  Robust feature extraction based on an asymmetric level-dependent auditory filterbank and a subband spectrum enhancement technique , 2014, Digit. Signal Process..

[22]  Lars Kai Hansen,et al.  Temporal Feature Integration for Music Genre Classification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Dahua Lin,et al.  Conditional Infomax Learning: An Integrated Framework for Feature Extraction and Fusion , 2006, ECCV.

[24]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[25]  Jörn Anemüller,et al.  Spectro-Temporal Gabor Filterbank Features for Acoustic Event Detection , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[26]  Jimmy Ludeña-Choez,et al.  NMF-based temporal feature integration for acoustic event classification , 2013, INTERSPEECH.

[27]  Francesco Piazza,et al.  An integrated system for voice command recognition and emergency detection based on audio signals , 2015, Expert Syst. Appl..

[28]  Jimmy Ludeña-Choez,et al.  NMF-Based Spectral Analysis for Acoustic Event Classification Tasks , 2013, NOLISP.

[29]  Francisco J. Valverde-Albacete,et al.  Morphologically Filtered Power-Normalized Cochleograms as Robust, Biologically Inspired Features for ASR , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[30]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[31]  Colas Schretter,et al.  Information-Theoretic Feature Selection in Microarray Data Using Variable Complementarity , 2008, IEEE Journal of Selected Topics in Signal Processing.

[32]  Thomas S. Huang,et al.  Real-world acoustic event detection , 2010, Pattern Recognit. Lett..

[33]  Hanseok Ko,et al.  Acoustic event recognition using dominant spectral basis vectors , 2015, INTERSPEECH.

[34]  Gernot A. Fink,et al.  A Bag-of-Features approach to acoustic event detection , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[35]  Björn W. Schuller,et al.  Semi-supervised learning helps in sound event classification , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[36]  Jimmy Ludeña-Choez,et al.  Speech Denoising Using Non-negative Matrix Factorization with Kullback-Leibler Divergence and Sparseness Constraints , 2012, IberSPEECH.

[37]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[38]  Bruce A. Draper,et al.  Feature selection from huge feature sets , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[39]  Jeroen Breebaart,et al.  Features for audio and music classification , 2003, ISMIR.

[40]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[41]  Bhiksha Raj,et al.  Speech denoising using nonnegative matrix factorization with priors , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[42]  Jozef Juhár,et al.  Feature selection for acoustic events detection , 2013, Multimedia Tools and Applications.

[43]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[44]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[45]  Driss Matrouf,et al.  Feature Selection Based on Information Theory for Speaker Verification , 2009, CIARP.

[46]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[47]  Jimmy Ludeña-Choez,et al.  Feature extraction based on the high-pass filtering of audio signals for Acoustic Event Classification , 2015, Comput. Speech Lang..