Evaluation of MFCC estimation techniques for music similarity

Spectral envelope parameters in the form of mel-frequency cepstral coefficients are often used for capturing timbral information of music signals in connection with genre classification applications. In this paper, we evaluate mel-frequency cepstral coefficient (MFCC) estimation techniques, namely the classical FFT and linear prediction based implementations and an implementation based on the more recent MVDR spectral estimator. The performance of these methods are evaluated in genre classification using a probabilistic classifier based on Gaussian Mixture models. MFCCs based on fixed order, signal independent linear prediction and MVDR spectral estimators did not exhibit any statistically significant improvement over MFCCs based on the simpler FFT.

[1]  Bhaskar D. Rao,et al.  MVDR based feature extraction for robust speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[2]  Arthur Flexer,et al.  Statistical evaluation of music information retrieval experiments , 2006 .

[3]  John H. L. Hansen,et al.  A new perspective on feature extraction for robust in-vehicle speech recognition , 2003, INTERSPEECH.

[4]  B. Moore An introduction to the psychology of hearing, 3rd ed. , 1989 .

[5]  A. Waibel,et al.  Warping and scaling of the minimum variance distortionless response , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[6]  M. Wolfel,et al.  Minimum variance distortionless response spectral estimation , 2005, IEEE Signal Processing Magazine.

[7]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[8]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[9]  Tao Li,et al.  Factors in automatic musical genre classification of audio signals , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[10]  Beth Logan,et al.  A music similarity function based on signal analysis , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[11]  Bhaskar D. Rao,et al.  Minimum variance distortionless response (MVDR) modeling of voiced speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Bhaskar D. Rao,et al.  All-pole modeling of speech based on the minimum variance distortionless response spectrum , 2000, Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136).

[13]  François Pachet,et al.  Improving Timbre Similarity : How high’s the sky ? , 2004 .

[14]  Daniel P. W. Ellis,et al.  A Large-Scale Evaluation of Acoustic and Subjective Music-Similarity Measures , 2004, Computer Music Journal.

[15]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[16]  Elias Pampalk,et al.  Computational Models of Music Similarity and their Application in Music Information Retrieval , 2006 .