A novel cepstral representation for timbre modeling of sound sources in polyphonic mixtures

We propose a novel cepstral representation called the uniform discrete cepstrum (UDC) to represent the timbre of sound sources in a sound mixture. Different from ordinary cepstrum and MFCC which have to be calculated from the full magnitude spectrum of a source after source separation, UDC can be calculated directly from isolated spectral points that are likely to belong to the source in the mixture spectrum (e.g., non-overlapping harmonics of a harmonic source). Existing cepstral representations that have this property are discrete cepstrum and regularized discrete cepstrum, however, compared to the proposed UDC, they are not as effective and are more complex to compute. The key advantage of UDC is that it uses a more natural and locally adaptive regularizer to prevent it from overfitting the isolated spectral points. We derive the mathematical relations between these cepstral representations, and compare their timbre modeling performances in the task of instrument recognition in polyphonic audio mixtures. We show that UDC and its mel-scale variant MUDC significantly outperform all the other representations.

[1]  Bryan Pardo,et al.  Multi-pitch Streaming of Harmonic Sound Mixtures , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[2]  Anssi Klapuri,et al.  Signal Processing Methods for Music Transcription , 2006 .

[3]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[4]  Bryan Pardo,et al.  Soundprism: An Online System for Score-Informed Source Separation of Music Audio , 2011, IEEE Journal of Selected Topics in Signal Processing.

[5]  Jiawei Han,et al.  Generalized Fisher Score for Feature Selection , 2011, UAI.

[6]  Masataka Goto,et al.  RWC Music Database: Popular, Classical and Jazz Music Databases , 2002, ISMIR.

[7]  O. Cappé,et al.  Regularized estimation of cepstrum envelope from discrete frequency points , 1995, Proceedings of 1995 Workshop on Applications of Signal Processing to Audio and Accoustics.

[8]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[9]  Xavier Rodet,et al.  An Improved Cepstral Method for Deconvolution of Source-Filter Systems with Discrete Spectra: Application to Musical Sound Signals , 1990, ICMC.

[10]  J C Brown Computer identification of musical instruments using pattern recognition with cepstral coefficients as features. , 1999, The Journal of the Acoustical Society of America.

[11]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[12]  D.P. Skinner,et al.  The cepstrum: A guide to processing , 1977, Proceedings of the IEEE.

[13]  John Makhoul,et al.  Spectral linear prediction: Properties and applications , 1975 .

[14]  Peter Kulchyski and , 2015 .

[15]  Changshui Zhang,et al.  Unsupervised Single-Channel Music Source Separation by Average Harmonic Structure Modeling , 2008, IEEE Transactions on Audio, Speech, and Language Processing.