Sparse Modeling for Artist Identification: Exploiting Phase Information and Vocal Separation

As artist identification deals with the vocal part of music, techniques such as vocal sound separation and speech feature extraction has been found relevant. In this paper, we argue that the phase information, which is usually overlooked in the literature, is also informative in modeling the voice timbre of a singer, given the necessary processing techniques. Specifically, instead of directly using the raw phase spectrum as features, we show that significantly better performance can be obtained by learning sparse features from the negative derivative of phase with respect to frequency (i.e., group delay function) using unsupervised feature learning algorithms. Moreover, better performance is achieved by using singing voice separation as a pre-processing step, and then learning features from both the magnitude spectrum and the group delay function. The proposed system achieves 66% accuracy in identifying 20 artists from the artist20 dataset, which is better than a prior art by 7%.

[1]  Yannis Stylianou,et al.  Auditory Spectrum-Based Pitched Instrument Onset Detection , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Masataka Goto,et al.  A robust predominant-F0 estimation method for real-time detection of melody and bass lines in CD recordings , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[3]  Yi-Hsuan Yang,et al.  On sparse and low-rank matrix decomposition for singing voice separation , 2012, ACM Multimedia.

[4]  Hiromasa Fujihara,et al.  A Modeling of Singing Voice Robust to Accompaniment Sounds and Its Application to Singer Identification and Vocal-Timbre-Similarity-Based Music Information Retrieval , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Yi-Hsuan Yang,et al.  Supervised dictionary learning for music genre classification , 2012, ICMR.

[6]  Anssi Klapuri,et al.  Singer Identification in Polyphonic Music Using Vocal Separation and Pattern Recognition Methods , 2007, ISMIR.

[7]  Günther Palm,et al.  Effects of phase on the perception of intervocalic stop consonants , 1997, Speech Commun..

[8]  Daniel P. W. Ellis,et al.  Classifying Music Audio with Timbral and Chroma Features , 2007, ISMIR.

[9]  Paris Smaragdis,et al.  Singing-voice separation from monaural recordings using robust principal component analysis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Patrick Flandrin,et al.  Improving the readability of time-frequency and time-scale representations by the reassignment method , 1995, IEEE Trans. Signal Process..

[11]  Kuldip K. Paliwal,et al.  Further intelligibility results from human listening tests using the short-time phase spectrum , 2006, Speech Commun..

[12]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[13]  Malcolm D. Macleod,et al.  Time Frequency Reassignment: A Review and Analysis , 2003 .

[14]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[15]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[16]  Wei-Ho Tsai,et al.  Background Music Removal Based on Cepstrum Transformation for Popular Singer Identification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  S. Dixon ONSET DETECTION REVISITED , 2006 .

[18]  Haizhou Li,et al.  Exploring Vibrato-Motivated Acoustic Features for Singer Identification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Rajesh M. Hegde,et al.  Significance of the Modified Group Delay Feature in Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Patrick Flandrin,et al.  On Phase-Magnitude Relationships in the Short-Time Fourier Transform , 2012, IEEE Signal Processing Letters.

[21]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[22]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[23]  Geoffroy Peeters,et al.  Combining Classification based on Local and Global Features: Application to Singer Identification , 2011 .

[24]  Mark B. Sandler,et al.  A tutorial on onset detection in music signals , 2005, IEEE Transactions on Speech and Audio Processing.

[25]  Douglas Eck,et al.  A Supervised Classification Algorithm for Note Onset Detection , 2006, EURASIP J. Adv. Signal Process..

[26]  Masataka Goto,et al.  RWC Music Database: Music genre database and musical instrument sound database , 2003, ISMIR.

[27]  Mathieu Lagrange,et al.  Robust Singer Identification in Polyphonic Music using Melody Enhancement and Uncertainty-based Learning , 2012, ISMIR.

[28]  Arvind Ganesh,et al.  Fast Convex Optimization Algorithms for Exact Recovery of a Corrupted Low-Rank Matrix , 2009 .

[29]  Mark B. Sandler,et al.  Music Information Retrieval Using Social Tags and Audio , 2009, IEEE Transactions on Multimedia.

[30]  Jyh-Shing Roger Jang,et al.  On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  Sridha Sridharan,et al.  The Delta-Phase Spectrum With Application to Voice Activity Detection and Speaker Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.