Musical instrument identification using multiscale Mel-frequency cepstral coefficients

We investigate the benefits of evaluating Mel-frequency cepstral coefficients (MFCCs) over several time scales in the context of automatic musical instrument identification for signals that are monophonic but derived from real musical settings. We define several sets of features derived from MFCCs computed using multiple time resolutions, and compare their performance against other features that are computed using a single time resolution, such as MFCCs, and derivatives of MFCCs. We find that in each task - pair-wise discrimination, and one vs. all classification - the features involving multiscale decompositions perform significantly better than features computed using a single time-resolution.

[1]  Bob L. Sturm,et al.  Incorporating scale information with cepstral features: Experiments on musical instrument recognition , 2010, Pattern Recognit. Lett..

[2]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[3]  Slim Essid,et al.  Classification automatique des signaux audio-fréquences : reconnaissance des instruments de musique. (Automatic Classification of Audio Signals: Machine Recognition of Musical Instruments) , 2005 .

[4]  Malcolm Slaney,et al.  Analysis of Minimum Distances in High-Dimensional Musical Spaces , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[6]  Nikos Fakotakis,et al.  Comparative Evaluation of Various MFCC Implementations on the Speaker Verification Task , 2007 .

[7]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[8]  A ReynoldsDouglas,et al.  A tutorial on text-independent speaker verification , 2004 .

[9]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[10]  Gaël Richard,et al.  Temporal Integration for Audio Classification With Application to Musical Instrument Classification , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  François Pachet,et al.  The bag-of-frames approach to audio pattern recognition: a sufficient model for urban soundscapes but not for polyphonic music. , 2007, The Journal of the Acoustical Society of America.

[12]  Daniel P. W. Ellis,et al.  Quantitative Analysis of a Common Audio Similarity Measure , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Stéphane Mallat,et al.  A Wavelet Tour of Signal Processing - The Sparse Way, 3rd Edition , 2008 .