MFCC combined with sparse coding for sound event classification under different noise environments

In recent years, the most popular method for sound event classification can be classified into two types: 1) Extract MFCC or PLP, then train classifier for classification; 2) Convert sound into spectrogram, then use the method of image classification. However, the two methods have not achieved satisfied performance. In order to promote the classification performance, we present classification method for a sound event based on MFCC and sparse coding which has a good effect on capturing the high-level representation features of the input data. Then the coefficients of sparse coding will be employed as new sound event features to train the classification model. Our experimental results demonstrate the great robustness, adaptability and an obvious improvement on sound event classification.

[1]  B. Bridge,et al.  Automatic traffic monitoring by intelligent sound detection , 1999 .

[2]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[3]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[4]  Sridhar Krishnan,et al.  Time–Frequency Matrix Feature Extraction and Classification of Environmental Audio Signals , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Arne Leijon,et al.  An efficient robust sound classification algorithm for hearing aids. , 2004, The Journal of the Acoustical Society of America.

[6]  Joachim M. Buhmann,et al.  Speech enhancement with sparse coding in learned dictionaries , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  K. Engan,et al.  Texture Classification Using Sparse Representations by Learned Compound Dictionaries , 2005 .

[8]  Mohammed Bennamoun,et al.  Sparse Representation for Speaker Identification , 2010, 2010 20th International Conference on Pattern Recognition.

[9]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[10]  Alain Rakotomamonjy,et al.  Histogram of gradients of Time-Frequency Representations for Audio scene detection , 2015, ArXiv.

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  Jian Yang,et al.  Robust sparse coding for face recognition , 2011, CVPR 2011.

[13]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[14]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[15]  Satoshi Nakamura,et al.  Acoustical Sound Database in Real Environments for Sound Scene Understanding and Hands-Free Speech Recognition , 2000, LREC.

[16]  Svetha Venkatesh,et al.  Joint learning and dictionary construction for pattern recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Chloé Clavel,et al.  Events Detection for an Audio-Based Surveillance System , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[18]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[19]  Sridhar Krishna Nemala,et al.  Sparse coding for speech recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Constantine Kotropoulos,et al.  Music genre classification via sparse representations of auditory temporal modulations , 2009, 2009 17th European Signal Processing Conference.