Multiclass Support Vector Machines for Environmental Sounds Classification Using log-Gabor Filters

In this paper we propose a robust environmental sound classification approach, based on spectrograms features driven from log-Gabor filters. This approach includes two methods. In the first methods, the spectrograms are passed through an appropriate logGabor filter banks and the outputs are averaged and underwent an optimal feature selection procedure based on a mutual information criteria. The second method uses the same steps but applied only to three patches extracted from each spectrogram. To investigate the accuracy of the proposed methods, we conduct experiments using a large database containing 10 environmental sound classes. The classification results based on Multiclass Support Vector Machines show that the second method is the most efficient with an average classification accuracy of 89.62 %. Keywords—Environmental sounds, Log-Gabor filters, Spectrogram, SVM Multiclass, Visual features.

[1]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[2]  Peter Kabal,et al.  Frame level noise classification in mobile environments , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[3]  Thomas Lampert,et al.  A survey of spectrogram track detection algorithms , 2010 .

[4]  Nicholas B. Allen,et al.  Stress Detection Using Speech Spectrograms and Sigma-pi Neuron Units , 2009, 2009 Fifth International Conference on Natural Computation.

[5]  Jean-Jacques E. Slotine,et al.  Audio classification from time-frequency texture , 2008, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Shrikanth Narayanan,et al.  Environmental Sound Recognition With Time–Frequency Audio Features , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Zied Lachiri,et al.  Environmental Sounds Classification Based on Visual Features , 2011, CIARP.

[8]  Qiang He,et al.  Research of STRAIGHT Spectrogram and Difference Subspace Algorithm for Speech Recognition , 2009, 2009 2nd International Congress on Image and Signal Processing.

[9]  Stéphane Mallat,et al.  Audio Denoising by Time-Frequency Block Thresholding , 2008, IEEE Transactions on Signal Processing.

[10]  Jean-Jacques E. Slotine,et al.  FastWavelet-Based Visual Classification , 2008, 2008 19th International Conference on Pattern Recognition.

[11]  Chong-Ho Choi,et al.  Input feature selection for classification problems , 2002, IEEE Trans. Neural Networks.

[12]  Asma Rabaoui,et al.  Using One-Class SVMs and Wavelets for Audio Surveillance , 2008, IEEE Transactions on Information Forensics and Security.

[13]  Tony Ezzat,et al.  Spectro-temporal analysis of speech using 2-d Gabor filters , 2007, INTERSPEECH.

[14]  Nicholas B. Allen,et al.  Stress and emotion recognition using log-Gabor filter analysis of speech spectrograms , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[15]  M. Kleinschmidt Methods for capturing spectro-temporal modulations in automatic speech recognition , 2001 .

[16]  Haizhou Li,et al.  Spectrogram Image Feature for Sound Event Classification in Mismatched Conditions , 2011, IEEE Signal Processing Letters.

[17]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[18]  Michael Kleinschmidt,et al.  Localized spectro-temporal features for automatic speech recognition , 2003, INTERSPEECH.