Cochlear pitch class profile for cover song identification

Abstract Pitch class profile (PCP), which can represent the harmonic progression of a piece of music very well, is one of the widely used audio features for cover version identification. In this letter, we describe a novel procedure that enhances PCP by substantially boosting the degree of instrumental accompaniment invariance without degrading the feature’s discriminative power. Our idea is based on the assumption that human ear can identify a cover of a pop song based on their singing voice quickly and easily. So, we combine two concepts from psychoacoustics: (i) time-varying loudness contour and (ii) critical band, which have been used in speech recognition field successfully, with the conventional PCP descriptor to enhance its discriminative power. Since the CPCPs aim at a representation of singing voice, they may also obtain improved performance (as compared to conventional PCPs) when applied to a cappella singing recordings. Experimental results demonstrate that the resulting PCP feature, called cochlear pitch class profile (CPCP), outperforms conventional PCP feature in the context of pop cover song identification application.

[1]  Roy D. Patterson,et al.  Aim-mat: the auditory image model in MATLAB , 2004 .

[2]  Takuya Fujishima,et al.  Realtime Chord Recognition of Musical Sound: a System Using Common Lisp Music , 1999, ICMC.

[3]  T. Irino,et al.  A time-domain, level-dependent auditory filter: The gammachirp , 1997 .

[4]  Pao-Chi Chang,et al.  Cover song identification with direct chroma feature extraction from AAC files , 2013, 2013 IEEE 2nd Global Conference on Consumer Electronics (GCCE).

[5]  Roy D. Patterson,et al.  Explaining two‐tone suppression and forward masking data using a compressive gammachirp auditory filterbank , 2005 .

[6]  DeLiang Wang,et al.  Incorporating Auditory Feature Uncertainties in Robust Speaker Identification , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[7]  Daniel P. W. Ellis,et al.  Identifying `Cover Songs' with Chroma Features and Dynamic Programming Beat Tracking , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[8]  Massimiliano Zanin,et al.  Cover song retrieval by cross recurrence quantification and unsupervised set detection , 2009 .

[9]  J. Stephen Downie,et al.  The music information retrieval evaluation exchange (2005-2007): A window into music information retrieval research , 2008 .

[10]  Richard F. Lyon,et al.  The Intervalgram: An Audio Feature for Large-Scale Cover-Song Recognition , 2012, CMMR.

[11]  Xavier Serra,et al.  Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  B. Moore,et al.  A Model of Loudness Applicable to Time-Varying Sounds , 2002 .

[13]  Chuan Xiao,et al.  Cover song identification using an enhanced chroma over a binary classifier based similarity measurement framework , 2012, ICONS 2012.

[14]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[15]  R. Andrzejak,et al.  Cross recurrence quantification for cover song identification , 2009 .

[16]  Emilia Gómez Gutiérrez,et al.  Tonal description of music audio signals , 2006 .

[17]  Judith C. Brown Calculation of a constant Q spectral transform , 1991 .

[18]  Meinard Müller,et al.  Towards Timbre-Invariant Audio Features for Harmony-Based Music , 2010, IEEE Transactions on Audio, Speech, and Language Processing.