An Empirical Study of Feature Extraction Methods for Audio Classification

With the growing popularity of video sharing web sites and the increasing use of consumer-level video capture devices, new algorithms are needed for intelligent searching and indexing of such data. The audio from these video streams is particularly challenging due to its low quality and high variability. Here, we perform a broad empirical study of features used for intelligent audio processing. We perform experiments on a dataset of 200 consumer videos over which we attempt to detect 10 semantic audio concepts.

[1]  Lei Chen,et al.  Mixed Type Audio Classification with Support Vector Machine , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[2]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[3]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[4]  Jiebo Luo,et al.  Large-scale multimodal semantic concept detection for consumer video , 2007, MIR '07.

[5]  Ziyou Xiong,et al.  Boosting Speech/Non-speech Classification Using Averaged Mel-Frequency Cepstrum Coefficients Features , 2002, IEEE Pacific Rim Conference on Multimedia.

[6]  Sergios Theodoridis,et al.  Violence Content Classification Using Audio Features , 2006, SETN.

[7]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[8]  Daniel P. W. Ellis,et al.  Voice activity detection in personal audio recordings using autocorrelogram compensation , 2006, INTERSPEECH.

[9]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[10]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[11]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[12]  Moncef Gabbouj,et al.  A generic audio classification and segmentation approach for multimedia indexing and retrieval , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  P. Mermelstein,et al.  Distance measures for speech recognition, psychological and instrumental , 1976 .

[14]  George Tzanetakis,et al.  Audio Analysis using the Discrete Wavelet Transform , 2001 .

[15]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Jeroen Breebaart,et al.  Features for audio and music classification , 2003, ISMIR.