Singer identification based on computational auditory scene analysis and missing feature methods

A major challenge for the identification of singers from monaural popular music recording is to remove or alleviate the influence of accompaniments. Our system is realized in two stages. In the first stage, we exploit computational auditory scene analysis (CASA) to segregate the singing voice units from a mixture signal. First, the pitch of singing voice is estimated to extract the pitch-based features of each unit in an acoustic vector. These features are then exploited to estimate the binary time-frequency (T-F) masks, where 1 indicates that the corresponding T-F unit is dominated by the singing voice, and 0 indicates otherwise. These regions dominated by the singing voice are considered reliable, and other units are unreliable or missing. Thus the acoustic vector is incomplete. In the second stage, two missing feature methods, the reconstruction of acoustic vector and the marginalization, are used to identify the singer by dealing with the incomplete acoustic vectors. For the reconstruction of acoustic vector, the complete acoustic vector is first reconstructed and then converted to obtain the Gammatone frequency cepstral coefficients (GFCCs), which are further used to identify the singer. For the marginalization, the probabilities that the voice belonging to a certain singer are computed on the basis of only the reliable components. We find that the reconstruction method outperforms the marginalization method, while both methods have significantly good performances, especially at signal-to-accompaniment ratios (SARs) of 0 dB and − 3 dB, in contrast to another system.

[1]  DeLiang Wang,et al.  Detecting pitch of singing voice in polyphonic audio , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[2]  Hiromasa Fujihara,et al.  A Modeling of Singing Voice Robust to Accompaniment Sounds and Its Application to Singer Identification and Vocal-Timbre-Similarity-Based Music Information Retrieval , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[4]  Kian-Lee Tan,et al.  A novel framework for efficient automated singer identification in large music databases , 2009, TOIS.

[5]  Hiromasa Fujihara,et al.  F0 Estimation Method for Singing Voice in Polyphonic Audio Signal Based on Statistical Vocal Model and Viterbi Search , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[6]  Mathieu Lagrange,et al.  Robust Singer Identification in Polyphonic Music using Melody Enhancement and Uncertainty-based Learning , 2012, ISMIR.

[7]  Peichen Chang Pitch Oriented Automatic Singer Identification in Pop Music , 2009, 2009 IEEE International Conference on Semantic Computing.

[8]  Pierre Divenyi Speech Separation by Humans and Machines , 2004 .

[9]  Richard Polfreman,et al.  H-Semantics: A Hybrid Approach to Singing Voice Separation , 2012 .

[10]  Wei Cai,et al.  Automatic singer identification based on auditory features , 2011, 2011 Seventh International Conference on Natural Computation.

[11]  Hiromasa Fujihara,et al.  Singer Identification Based on Accompaniment Sound Reduction and Reliable Frame Selection , 2005, ISMIR.

[12]  DeLiang Wang,et al.  On the optimality of ideal binary time-frequency masks , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Richard M. Stern,et al.  Reconstruction of missing features for robust speech recognition , 2004, Speech Commun..

[14]  Guizhong Liu,et al.  Instrument identification and pitch estimation in multi-timbre polyphonic musical signals based on probabilistic mixture model decomposition , 2012, Journal of Intelligent Information Systems.

[15]  Haizhou Li,et al.  Exploring Perceptual Based Timbre Feature for Singer Identification , 2007, CMMR.

[16]  Haizhou Li,et al.  On fusion of timbre-motivated features for singing voice detection and singer identification , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  DeLiang Wang,et al.  On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis , 2005, Speech Separation by Humans and Machines.

[18]  Guizhong Liu,et al.  Dynamic characteristics of musical note for musical instrument classification , 2011, 2011 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC).

[19]  DeLiang Wang,et al.  CASA-Based Robust Speaker Identification , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Bozena Kostek,et al.  System for Automatic Singing Voice Recognition , 2008 .

[21]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[22]  Guy J. Brown,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .

[23]  Xavier Serra,et al.  Voice Morphing System for Impersonating in Karaoke Applications , 2000, ICMC.

[24]  Gregory H. Wakefield,et al.  Singing voice identification using spectral envelope estimation , 2004, IEEE Transactions on Speech and Audio Processing.

[25]  Mark A. Bartsch,et al.  Automatic singer identification in polyphonic music. , 2004 .

[26]  Youngmoo E. Kim,et al.  Singer Identification in Popular Music Recordings Using Voice Coding Features , 2002 .

[27]  Wei-Ho Tsai,et al.  Singer Identification Based on Spoken Data in Voice Characterization , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Changsheng Xu,et al.  Singer identification based on vocal and instrumental models , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[29]  Wei-Ho Tsai,et al.  Popular singer identification based on cepstrum transformation , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[30]  Lotfi Ben Romdhane,et al.  NODAR: mining globally distributed substructures from a single labeled graph , 2012, Journal of Intelligent Information Systems.

[31]  DeLiang Wang,et al.  Separation of singing voice from music accompaniment for monaural recordings , 2007 .

[32]  Kian-Lee Tan,et al.  Towards efficient automated singer identification in large music databases , 2006, SIGIR.

[33]  DeLiang Wang,et al.  A Supervised Learning Approach to Monaural Segregation of Reverberant Speech , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[34]  Wei-Ho Tsai,et al.  Background Music Removal Based on Cepstrum Transformation for Popular Singer Identification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.