Environmental sound classification using spectral and harmonic feature combination

Recognition of environmental sounds (ES) is a challenging problem due to the unstructured nature and typically noise-like and flat spectrums of these sounds. In the paper, we propose a composite audio feature to capture the different characteristics of ESs by combining spectral and harmonic audio features. In the experiments, thirteen (13) ES categories, namely emergency alarm, car horn, gun, explosion, automobile, motorcycle, helicopter, water, wind, rain, applause, crowd, and laughter are detected based on the proposed feature set and by using the SVM classifier. Extensive experiments have been conducted to demonstrate the effectiveness of the proposed joint feature set for ES classification. Our experiments show that, the proposed feature set ASFCS-H (Audio Spectrum Flatness, Centroid, Spread, and Audio Harmonicity) is quite successful in recognition of ESs with an average F-measure value of 80.6%.

[1]  Lie Lu,et al.  A flexible framework for key audio effects detection and auditory context inference , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Moncef Gabbouj,et al.  A generic audio classification and segmentation approach for multimedia indexing and retrieval , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Lei Chen,et al.  Mixed Type Audio Classification with Support Vector Machine , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[4]  Thomas Sikora,et al.  Audio classification based on MPEG-7 spectral basis representations , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  Ben P. Milner,et al.  Acoustic environment classification , 2006, TSLP.

[6]  Li Yang,et al.  Environmental sound classification for scene recognition using local discriminant bases and HMM , 2011, ACM Multimedia.

[7]  Shrikanth Narayanan,et al.  Environmental Sound Recognition With Time–Frequency Audio Features , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Vesa T. Peltonen,et al.  Audio-based context recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Adnan Yazici,et al.  Content-Based Classification and Segmentation of Mixed-Type Audio by Using MPEG-7 Features , 2009, 2009 First International Conference on Advances in Multimedia.