Classification of audio signals using AANN and GMM

Today, digital audio applications are part of our everyday lives. Audio classification can provide powerful tools for content management. If an audio clip automatically can be classified it can be stored in an organised database, which can improve the management of audio dramatically. In this paper, we propose effective algorithms to automatically classify audio clips into one of six classes: music, news, sports, advertisement, cartoon and movie. For these categories a number of acoustic features that include linear predictive coefficients, linear predictive cepstral coefficients and mel-frequency cepstral coefficients are extracted to characterize the audio content. The autoassociative neural network model (AANN) is used to capture the distribution of the acoustic feature vectors. The AANN model captures the distribution of the acoustic features of a class, and the backpropagation learning algorithm is used to adjust the weights of the network to minimize the mean square error for each feature vector. The proposed method also compares the performance of AANN with a Gaussian mixture model (GMM) wherein the feature vectors from each class were used to train the GMM models for those classes. During testing, the likelihood of a test sample belonging to each model is computed and the sample is assigned to the class whose model produces the highest likelihood.

[1]  Ronald W. Schafer,et al.  Digital Processing of Speech Signals , 1978 .

[2]  Hervé Bourlard,et al.  Speech/music segmentation using entropy and dynamism features in a HMM classification framework , 2003, Speech Commun..

[3]  Kaamran Raahemifar,et al.  Content based audio classification and retrieval using joint time-frequency analysis , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  B. Yegnanarayana,et al.  Artificial Neural Networks , 2004 .

[5]  Karthikeyan Umapathy,et al.  Multigroup classification of audio signals using time-frequency parameters , 2005, IEEE Transactions on Multimedia.

[6]  Lonce L. Wyse,et al.  Generic Audio Classification Using a Hybrid Model Based on GMMs and HMMs , 2005, 11th International Multimedia Modelling Conference.

[7]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[8]  James R. Hopgood,et al.  Nonconcurrent multiple speakers tracking based on extended Kalman particle filter , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Henry Leung,et al.  Classification of audio radar signals using radial basis function neural networks , 2003, IEEE Trans. Instrum. Meas..

[10]  John H. L. Hansen,et al.  Advances in unsupervised audio classification and segmentation for the broadcast news and NGSW corpora , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Karthikeyan Umapathy,et al.  Audio Signal Feature Extraction and Classification Using Local Discriminant Bases , 2004, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Trieu-Kien Truong,et al.  Audio classification and categorization based on wavelets and support vector Machine , 2005, IEEE Transactions on Speech and Audio Processing.

[13]  B. Yegnanarayana,et al.  Autoassociative Neural Network Models for Pattern Recognition Tasks in Speech and Image , 2002 .

[14]  Guodong Guo,et al.  Content-based audio classification and retrieval by support vector machines , 2003, IEEE Trans. Neural Networks.

[15]  Kishore Prahallad,et al.  AANN: an alternative to GMM for pattern recognition , 2002, Neural Networks.

[16]  Ishwar K. Sethi,et al.  Classification of general audio data for content-based retrieval , 2001, Pattern Recognit. Lett..

[17]  Georgios Tziritas,et al.  A speech/music discriminator based on RMS and zero-crossings , 2005, IEEE Transactions on Multimedia.

[18]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[19]  Sankar K. Pal,et al.  Soft Computing Approach to Pattern Recognition and Image Processing , 2002 .

[20]  Eliathamby Ambikairajah,et al.  Analysis of an MFCC-based audio indexing system for efficient coding of multimedia sources , 2005, Proceedings of the Eighth International Symposium on Signal Processing and Its Applications, 2005..

[21]  Adrian D. C. Chan,et al.  Security monitoring using microphone arrays and audio classification , 2006, IEEE Transactions on Instrumentation and Measurement.

[22]  Lie Lu,et al.  Digital Object Identifier (DOI) 10.1007/s00530-002-0065-0 Multimedia Systems , 2003 .

[23]  Bo Xu,et al.  SVM-based audio scene classification , 2005, 2005 International Conference on Natural Language Processing and Knowledge Engineering.

[24]  Changsheng Xu,et al.  Automatic music classification and summarization , 2005, IEEE Transactions on Speech and Audio Processing.

[25]  Stan Z. Li,et al.  Content-based audio classification and retrieval using the nearest feature line method , 2000, IEEE Trans. Speech Audio Process..

[26]  Bayya Yegnanarayana,et al.  Multimodal person authentication using speech, face and visual speech , 2008, Comput. Vis. Image Underst..

[27]  Vesa T. Peltonen,et al.  Audio-based context recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[29]  Moncef Gabbouj,et al.  A generic audio classification and segmentation approach for multimedia indexing and retrieval , 2006, IEEE Transactions on Audio, Speech, and Language Processing.