Environment Recognition from Audio Using MPEG-7 Features

In this paper, we introduce a full use of MPEG-7 Audio features for environment recognition from audio for different multimedia applications. Environment recognition from audio files is a growing area of interest, however, compared to other branches of multimedia it is a less researched one. To recognize environment, we utilize total of 17 temporal and spectral MPEG-7 Audio low- level descriptors as features. The performance is compared with Mel-frequency cepstral coefficients (MFCC) features. Experimental results show a significant improvement of the proposed MPEG-7 based environment recognition over that of the conventional MFCC based features. Zero-crossing rate is appended with the MPEG-7 based feature to yield even better performance. The best performance is achieved with combined MFCC, MPEG-7 and zero- crossing features. Index Items: Environment recognition, MPEG-7 Audio, MFCC, multimedia.

[1]  Robert C. Maher Audio Enancement using Nonlinear Time-Frequency Filtering , 2005 .

[2]  C.-C. Jay Kuo,et al.  Where am I? Scene Recognition for Mobile Robots using Audio Features , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[3]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[4]  Jhing-Fa Wang,et al.  Environmental Sound Classification using Hybrid SVM/KNN Classifier and MPEG-7 Audio Low-Level Descriptor , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[5]  William M. Campbell,et al.  Understanding Scores in Forensic Speaker Recognition , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[6]  Nikos Fakotakis,et al.  Automatic Recognition of Urban Soundscenes , 2008, New Directions in Intelligent Interactive Multimedia.

[7]  Qiang Ji,et al.  Adaptive context recognition based on audio signal , 2008, 2008 19th International Conference on Pattern Recognition.

[8]  Didier Meuwly,et al.  The inference of identity in forensic speaker recognition , 2000, Speech Commun..

[9]  Ben P. Milner,et al.  Context awareness using environmental noise classification , 2003, INTERSPEECH.

[10]  Ulrich Hatje,et al.  Frequency-Domain Processors for Efficient Removal of Noise and Unwanted Audio Events , 2005 .

[11]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[12]  Vesa T. Peltonen,et al.  Audio-based context recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  David G. Stork,et al.  Pattern Classification , 1973 .

[14]  Jana Dittmann,et al.  Digital audio forensics: a first practical evaluation on microphone and environment classification , 2007, MM&Sec.

[15]  Alexander H. Waibel,et al.  Classifying user environment for mobile applications using linear autoencoding of ambient audio , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[16]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[17]  Driss Matrouf,et al.  Forensic speaker recognition , 2009, IEEE Signal Process. Mag..

[18]  C.-C. Jay Kuo,et al.  Environmental sound recognition using MP-based features , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.