Content based audio classification: a neural network approach

Content based music genre classification is a key component for next generation multimedia search agents. This paper introduces an audio classification technique based on audio content analysis. Artificial Neural Networks (ANNs), specifically multi-layered perceptrons (MLPs) are implemented to perform the classification task. Windowed audio files of finite length are analyzed to generate multiple feature sets which are used as input vectors to a parallel neural architecture that performs the classification. This paper examines a combination of linear predictive coding (LPC), mel frequency cepstrum coefficients (MFCCs), Haar Wavelet, Daubechies Wavelet and Symlet coefficients as feature sets for the proposed audio classifier. Parallel to MLP, a Gaussian radial basis function (GRBF) based ANN is also implemented and analyzed. The obtained prediction accuracy of 87.3% in determining the audio genres claims the efficiency of the proposed architecture. The ANN prediction values are processed by a rule based inference engine (IE) that presents the final decision.

[1]  Jose C. Principe,et al.  Neural and Adaptive Systems: Fundamentals through Simulations with CD-ROM , 1999 .

[2]  M. Kendall,et al.  A Study in the Analysis of Stationary Time-Series. , 1955 .

[3]  George Tzanetakis,et al.  Audio Analysis using the Discrete Wavelet Transform , 2001 .

[4]  Amara Lynn Graps,et al.  An introduction to wavelets , 1995 .

[5]  Norbert Wiener,et al.  Extrapolation, Interpolation, and Smoothing of Stationary Time Series, with Engineering Applications , 1949 .

[6]  Thomas Kailath,et al.  A view of three decades of linear filtering theory , 1974, IEEE Trans. Inf. Theory.

[7]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[8]  Beth Logan,et al.  Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.

[9]  John E. Markel,et al.  Linear Prediction of Speech , 1976, Communication and Cybernetics.

[10]  V. Kvasnicka,et al.  Neural and Adaptive Systems: Fundamentals Through Simulations , 2001, IEEE Trans. Neural Networks.

[11]  Peter Whittle,et al.  A Study in the Analysis of Stationary Time-Series. , 1954 .

[12]  Guodong Guo,et al.  Content-based audio classification and retrieval by support vector machines , 2003, IEEE Trans. Neural Networks.

[13]  B. Atal,et al.  Predictive coding of speech signals and subjective error criteria , 1979 .

[14]  Stefan Wermter,et al.  Knowledge extraction from radial basis function networks and multilayer perceptrons , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[15]  Norbert Wiener,et al.  Extrapolation, Interpolation, and Smoothing of Stationary Time Series , 1964 .