Audio Feature and Classifier Analysis for Efficient Recognition of Environmental Sounds

Environmental sounds (ES) have different characteristics, such as unstructured nature and typically noise-like and flat spectrums, which make recognition task difficult compared to speech or music sounds. Here, we perform an exhaustive feature and classifier analysis for the recognition of considerably similar ES categories and propose a best representative feature to yield higher recognition accuracy. In the experiments, thirteen (13) ES categories, namely emergency alarm, car horn, gun, explosion, automobile, helicopter, water, wind, rain, applause, crowd, and laughter are detected and tested based on eleven (11) audio features (MPEG-7 family, ZCR, MFCC, and combinations) by using the HMM and SVM classifiers. Extensive experiments have been conducted to demonstrate the effectiveness of these joint features for ES classification. Our experiments show that, the joint feature set ASFCS-H (Audio Spectrum Flatness, Centroid, Spread, and Audio Harmonicity) is the best representative feature set with an average F-measure value of 80.6%.

[1]  Li Yang,et al.  Environmental sound classification for scene recognition using local discriminant bases and HMM , 2011, ACM Multimedia.

[2]  Yiannis Kompatsiaris,et al.  Environmental data extraction from multimedia resources , 2012, MAED '12.

[3]  Qiang Huang,et al.  SVM-Based Audio Classification for Content- Based Multimedia Retrieval , 2007, MCAM.

[4]  Lie Lu,et al.  A flexible framework for key audio effects detection and auditory context inference , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Lei Chen,et al.  Mixed Type Audio Classification with Support Vector Machine , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[6]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[7]  C.-C. Jay Kuo,et al.  Hierarchical classification of audio data for archiving and retrieving , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[8]  Ben P. Milner,et al.  Acoustic environment classification , 2006, TSLP.

[9]  Vesa T. Peltonen,et al.  Audio-based context recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Thomas Sikora,et al.  Audio classification based on MPEG-7 spectral basis representations , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Shrikanth Narayanan,et al.  Environmental Sound Recognition With Time–Frequency Audio Features , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Mahmood R. Azimi-Sadjadi,et al.  Characterization of Multiple Transient Acoustical Sources From Time-Transform Representations , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Jhing-Fa Wang,et al.  Environmental Sound Classification using Hybrid SVM/KNN Classifier and MPEG-7 Audio Low-Level Descriptor , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[14]  Adnan Yazici,et al.  Content-Based Classification and Segmentation of Mixed-Type Audio by Using MPEG-7 Features , 2009, 2009 First International Conference on Advances in Multimedia.

[15]  Lie Lu,et al.  Content-based audio segmentation using support vector machines , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[16]  João Paulo da Silva Neto,et al.  Audio segmentation, classification and clustering in a broadcast news task , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[17]  C.-C. Jay Kuo,et al.  Current Developments and Future Trends in Audio Authentication , 2012, IEEE MultiMedia.

[18]  Moncef Gabbouj,et al.  A generic audio classification and segmentation approach for multimedia indexing and retrieval , 2006, IEEE Transactions on Audio, Speech, and Language Processing.