Environment Recognition Using Selected MPEG-7 Audio Features and Mel-Frequency Cepstral Coefficients

In this paper, we propose a system for environment recognition using selected MPEG-7 audio low level descriptors together with conventional mel-frequency cepstral coefficients (MFCC). The MPEG-7 descriptors are first ranked based on Fisher’s discriminant ratio. Then principal component analysis is applied on top ranked 30 MPEG-7 descriptors to obtain 13 features. These 13 features are appended with MFCC features to complete the feature set of the proposed system. Gaussian mixture models (GMMs) are used as classifier. The system is evaluated using ten different environment sounds. The experimental results show a significant improvement in recognition performance of the proposed system over MFCC or full MPEG-7 descriptor based systems. For example, the best performance is achieved in Restaurant environment where MFCC, full MPEG-7, and the proposed method give 90%, 94%, and 96% accuracy, respectively.

[1]  Vesa T. Peltonen,et al.  Audio-based context recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Didier Meuwly,et al.  The inference of identity in forensic speaker recognition , 2000, Speech Commun..

[3]  Tin Kam Ho,et al.  Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  David G. Stork,et al.  Pattern Classification , 1973 .

[5]  Nikos Fakotakis,et al.  Automatic Recognition of Urban Soundscenes , 2008, New Directions in Intelligent Interactive Multimedia.

[6]  Thomas Sikora,et al.  How Efficient is MPEG-7 for General Sound Recognition? , 2004 .

[7]  Jana Dittmann,et al.  Digital audio forensics: a first practical evaluation on microphone and environment classification , 2007, MM&Sec.

[8]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Jhing-Fa Wang,et al.  Environmental Sound Classification using Hybrid SVM/KNN Classifier and MPEG-7 Audio Low-Level Descriptor , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[10]  Regunathan Radhakrishnan,et al.  Comparing MFCC and MPEG-7 audio features for feature extraction, maximum likelihood HMM and entropic prior HMM for sports audio classification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[11]  Qiang Ji,et al.  Adaptive context recognition based on audio signal , 2008, 2008 19th International Conference on Pattern Recognition.

[12]  C.-C. Jay Kuo,et al.  Environmental sound recognition using MP-based features , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Robert C. Maher Audio Enancement using Nonlinear Time-Frequency Filtering , 2005 .

[14]  Ben P. Milner,et al.  Context awareness using environmental noise classification , 2003, INTERSPEECH.

[15]  Horst Eidenberger,et al.  TOWARDS AN OPTIMAL FEATURE SET FOR ENVIRONMENTAL SOUND RECOGNITION , 2005 .

[16]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[17]  Driss Matrouf,et al.  Forensic speaker recognition , 2009, IEEE Signal Process. Mag..

[18]  Gang Sun,et al.  Feature selection for pattern classification problems , 2004, The Fourth International Conference onComputer and Information Technology, 2004. CIT '04..

[19]  Ulrich Hatje,et al.  Frequency-Domain Processors for Efficient Removal of Noise and Unwanted Audio Events , 2005 .

[20]  C.-C. Jay Kuo,et al.  Where am I? Scene Recognition for Mobile Robots using Audio Features , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[21]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[22]  William M. Campbell,et al.  Understanding Scores in Forensic Speaker Recognition , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[23]  Alexander H. Waibel,et al.  Classifying user environment for mobile applications using linear autoencoding of ambient audio , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[24]  Ghulam Muhammad,et al.  Environment Recognition from Audio Using MPEG-7 Features , 2009, 2009 Fourth International Conference on Embedded and Multimedia Computing.

[25]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.