Development of a Reference Platform for Generic Audio Classification

Detection of key sounds, such as applause, laugh, music, environmental noise, etc., is one of the challenges in intelligent management of multimedia information and content understanding. In this paper, we report progress in development of a reference content-based audio classification algorithm that is based on a conventional and widely accepted approach, namely signal parameterization by MFCC followed by GMM classification. Our developed labeled audio database and the conventional classification model should serve as a reference platform for an evaluation of novel, alternative or more advanced methods in audio content analysis.

[1]  Lie Lu,et al.  A flexible framework for key audio effects detection and auditory context inference , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Beth Logan,et al.  Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.

[3]  Ying Li,et al.  Content-based movie analysis and indexing based on audiovisual cues , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Lonce L. Wyse,et al.  Generic Audio Classification Using a Hybrid Model Based on GMMs and HMMs , 2005, 11th International Multimedia Modelling Conference.

[5]  Thomas Sikora,et al.  MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval , 2005 .

[6]  Mohan S. Kankanhalli,et al.  Creating audio keywords for event detection in soccer video , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[7]  Svetha Venkatesh,et al.  Detecting indexical signs in film audio for scene interpretation , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[8]  Ying Li,et al.  Instructional Video Content Analysis Using Audio Information , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Raimondo Schettini,et al.  Content-Based Classification of Digital Photos , 2002, Multiple Classifier Systems.

[10]  Ioannis Pitas,et al.  Content-based video parsing and indexing based on audio-visual interaction , 2001, IEEE Trans. Circuits Syst. Video Technol..

[11]  Trieu-Kien Truong,et al.  Audio classification and categorization based on wavelets and support vector Machine , 2005, IEEE Transactions on Speech and Audio Processing.

[12]  Renate Sitte,et al.  Comparison of techniques for environmental sound recognition , 2003, Pattern Recognit. Lett..

[13]  Douglas Keislar,et al.  Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[14]  Douglas D. O'Shaughnessy,et al.  Interacting with computers by voice: automatic speech recognition and synthesis , 2003, Proc. IEEE.

[15]  Daniel P. W. Ellis,et al.  Selection, parameter estimation, and discriminative training of hidden Markov models for general audio modeling , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).