Sparse cepstral codes and power scale for instrument identification

This paper presents a novel feature representation called sparse cepstral codes for instrument identification. We first motivate the approach by discussing why cepstrum is suitable for instrument identification. Then we propose the use of sparse coding and power normalization to derive compact codes that better represent the information of the cepstrum. Our evaluation on both uni-source and multi-source instrument identification tasks show that the proposed feature leads to significantly better accuracy than existing methods. We further show that cepstrum obtained from power-scaled spectrum can do better than typical cepstrum especially in multi-source signal. The proposed system achieves 0.955 F-score in uni-source dataset and 0.688 F-score in multi-source dataset.

[1]  Anssi Klapuri,et al.  Musical instrument recognition using cepstral coefficients and temporal features , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[2]  DeLiang Wang,et al.  Analyzing noise robustness of MFCC and GFCC features in speaker identification , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Juan Pablo Bello,et al.  Non-Linear Semantic Embedding for Organizing Large Instrument Sample Libraries , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[4]  Allen Y. Yang,et al.  Fast L1-Minimization Algorithms For Robust Face Recognition , 2010, 1007.3753.

[5]  Yi-Hsuan Yang,et al.  A Systematic Evaluation of the Bag-of-Frames Representation for Music Information Retrieval , 2014, IEEE Transactions on Multimedia.

[6]  Toni Heittola,et al.  Modified Group Delay Feature for Musical Instrument Recognition , 2013 .

[7]  Masataka Goto,et al.  RWC Music Database: Music genre database and musical instrument sound database , 2003, ISMIR.

[8]  S. S. Stevens On the psychophysical law. , 1957, Psychological review.

[9]  Jordi Janer,et al.  A Comparison of Sound Segregation Techniques for Predominant Instrument Recognition in Musical Audio Signals , 2012, ISMIR.

[10]  Gaël Richard,et al.  Temporal Integration for Audio Classification With Application to Musical Instrument Classification , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[12]  Daniel P. W. Ellis,et al.  Signal Processing for Music Analysis , 2011, IEEE Journal of Selected Topics in Signal Processing.

[13]  Daniel P. W. Ellis,et al.  A Large-Scale Evaluation of Acoustic and Subjective Music-Similarity Measures , 2004, Computer Music Journal.

[14]  Yi-Hsuan Yang,et al.  Supervised dictionary learning for music genre classification , 2012, ICMR.

[15]  A. W. M. van den Enden,et al.  Discrete Time Signal Processing , 1989 .

[16]  Patrice Alexandre,et al.  Root cepstral analysis: A unified view. Application to speech processing in car noise environments , 1993, Speech Commun..

[17]  Marc Leman,et al.  Content-Based Music Information Retrieval: Current Directions and Future Challenges , 2008, Proceedings of the IEEE.

[18]  Ferdinand Fuhrmann Automatic musical instrument recognition from polyphonic music audio signals , 2012 .

[19]  Xavier Rodet,et al.  An Improved Cepstral Method for Deconvolution of Source-Filter Systems with Discrete Spectra: Application to Musical Sound Signals , 1990, ICMC.

[20]  Emmanuel Vincent,et al.  Instrument-Specific Harmonic Atoms for Mid-Level Music Representation , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  DeLiang Wang,et al.  CASA-Based Robust Speaker Identification , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[23]  X. Rodet EFFICIENT SPECTRAL ENVELOPE ESTIMATION AND ITS APPLICATION TO PITCH SHIFTING AND ENVELOPE PRESERVATION , 2005 .

[24]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[25]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Yann LeCun,et al.  Moving Beyond Feature Design: Deep Architectures and Automatic Feature Learning in Music Informatics , 2012, ISMIR.

[27]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[28]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[29]  Allen Y. Yang,et al.  Fast L1-Minimization Algorithms For Robust Face Recognition , 2010 .

[30]  Yi-Hsuan Yang,et al.  Dual-layer bag-of-frames model for music genre classification , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.