论文信息 - Content based audio classification: a neural network approach

Content based audio classification: a neural network approach

Content based music genre classification is a key component for next generation multimedia search agents. This paper introduces an audio classification technique based on audio content analysis. Artificial Neural Networks (ANNs), specifically multi-layered perceptrons (MLPs) are implemented to perform the classification task. Windowed audio files of finite length are analyzed to generate multiple feature sets which are used as input vectors to a parallel neural architecture that performs the classification. This paper examines a combination of linear predictive coding (LPC), mel frequency cepstrum coefficients (MFCCs), Haar Wavelet, Daubechies Wavelet and Symlet coefficients as feature sets for the proposed audio classifier. Parallel to MLP, a Gaussian radial basis function (GRBF) based ANN is also implemented and analyzed. The obtained prediction accuracy of 87.3% in determining the audio genres claims the efficiency of the proposed architecture. The ANN prediction values are processed by a rule based inference engine (IE) that presents the final decision.

Vikramjit Mitra | Chia-Jiu Wang | V. Mitra | Chia-Jiu Wang

[1] Jose C. Principe,et al. Neural and Adaptive Systems: Fundamentals through Simulations with CD-ROM , 1999 .

[2] M. Kendall,et al. A Study in the Analysis of Stationary Time-Series. , 1955 .

[3] George Tzanetakis,et al. Audio Analysis using the Discrete Wavelet Transform , 2001 .

[4] Amara Lynn Graps,et al. An introduction to wavelets , 1995 .

[5] Norbert Wiener,et al. Extrapolation, Interpolation, and Smoothing of Stationary Time Series, with Engineering Applications , 1949 .

[6] Thomas Kailath,et al. A view of three decades of linear filtering theory , 1974, IEEE Trans. Inf. Theory.

[7] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[8] Beth Logan,et al. Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.

[9] John E. Markel,et al. Linear Prediction of Speech , 1976, Communication and Cybernetics.

[10] V. Kvasnicka,et al. Neural and Adaptive Systems: Fundamentals Through Simulations , 2001, IEEE Trans. Neural Networks.

[11] Peter Whittle,et al. A Study in the Analysis of Stationary Time-Series. , 1954 .

[12] Guodong Guo,et al. Content-based audio classification and retrieval by support vector machines , 2003, IEEE Trans. Neural Networks.

[13] B. Atal,et al. Predictive coding of speech signals and subjective error criteria , 1979 .

[14] Stefan Wermter,et al. Knowledge extraction from radial basis function networks and multilayer perceptrons , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[15] Norbert Wiener,et al. Extrapolation, Interpolation, and Smoothing of Stationary Time Series , 1964 .