Comparing MFCC and MPEG-7 audio features for feature extraction, maximum likelihood HMM and entropic prior HMM for sports audio classification

We present a comparison of 6 methods for classification of sports audio. For the feature extraction we have two choices: MPEG-7 audio features and Mel-scale frequency cepstrum coefficients (MFCC). For the classification we also have two choices: maximum likelihood hidden Markov models (ML-HMM) and entropic prior HMM (EP-HMM). EP-HMM, in turn, has two variations: with and without trimming of the model parameters. We thus have 6 possible methods, each of which corresponds to a combination. Our results show that all the combinations achieve classification accuracy of around 90% with the best and the second best being MPEG-7 features with EP-HMM and MFCC with ML-HMM.

[1]  Lie Lu,et al.  Content-based audio segmentation using support vector machines , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[2]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[3]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Douglas Keislar,et al.  Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[5]  Michael J. Carey,et al.  A comparison of features for speech, music discrimination , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[6]  M. Casey,et al.  MPEG-7 sound-recognition tools , 2001, IEEE Trans. Circuits Syst. Video Technol..

[7]  C.-C. Jay Kuo,et al.  Content-based classification and retrieval of audio , 1998, Optics & Photonics.