Background Music Removal Based on Cepstrum Transformation for Popular Singer Identification

One major challenge of identifying singers in popular music recordings lies in how to reduce the interference of background accompaniment in trying to characterize the singer voice. Although a number of studies on automatic Singer IDentification (SID) from acoustic features have been reported, most systems to date, however, do not explicitly deal with the background accompaniment. This study proposes a background accompaniment removal approach for SID by exploiting the underlying relationships between solo singing voices and their accompanied versions in cepstrum. The relationships are characterized by a transformation estimated using a large set of accompanied singing generated by manually mixing solo singing with the accompaniments extracted from Karaoke VCDs. Such a transformation reflects the cepstrum variations of a singing voice before and after it is added with accompaniments. When an unknown accompanied voice is presented to our system, the transformation is performed to convert the cepstrum of the accompanied voice into a solo-voice-like one. Our experiments show that such a background removal approach improves the SID accuracy significantly; even when a test music recording involves sung language not covered in the data for estimating the transformation.

[1]  Xiaodong Cui,et al.  Stereo-Based Stochastic Mapping for Robust Speech Recognition , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Jaakko Astola,et al.  The Mel-Frequency Cepstral Coefficients in the Context of Singer Identification , 2005, ISMIR.

[3]  Tuomas Virtanen,et al.  Automatic Recognition of Lyrics in Singing , 2010, EURASIP J. Audio Speech Music. Process..

[4]  Changsheng Xu,et al.  Singer identification based on vocal and instrumental models , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[5]  Hsin-Min Wang,et al.  Automatic singer recognition of popular music recordings via estimation and modeling of solo vocal signals , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Haizhou Li,et al.  Exploring Vibrato-Motivated Acoustic Features for Singer Identification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Haizhou Li,et al.  On fusion of timbre-motivated features for singing voice detection and singer identification , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Tong Zhang,et al.  Automatic singer identification , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[9]  Stephen Cox,et al.  Some statistical issues in the comparison of speech recognition algorithms , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[10]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[11]  Li Deng,et al.  Evaluation of the SPLICE algorithm on the Aurora2 database , 2001, INTERSPEECH.

[12]  Charles W. Therrien,et al.  Discrete Random Signals and Statistical Signal Processing , 1992 .

[13]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[14]  Kian-Lee Tan,et al.  Towards efficient automated singer identification in large music databases , 2006, SIGIR.

[15]  Hsin-Min Wang,et al.  Blind Clustering of Popular Music Recordings Based on Singer Voice Characteristics , 2004, Computer Music Journal.

[16]  Li Deng,et al.  Large-vocabulary speech recognition under adverse acoustic environments , 2000, INTERSPEECH.

[17]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[18]  Anssi Klapuri,et al.  Singer Identification in Polyphonic Music Using Vocal Separation and Pattern Recognition Methods , 2007, ISMIR.

[19]  Youngmoo E. Kim,et al.  Singer Identification in Popular Music Recordings Using Voice Coding Features , 2002 .

[20]  Douglas D. O'Shaughnessy,et al.  Statistical recovery of wideband speech from narrowband speech , 1992, IEEE Trans. Speech Audio Process..

[21]  Masataka Goto,et al.  RWC Music Database: Popular, Classical and Jazz Music Databases , 2002, ISMIR.

[22]  Chih-Chin Liu,et al.  A singer identification technique for content-based classification of MP3 music objects , 2002, CIKM '02.

[23]  Gregory H. Wakefield,et al.  Singing voice identification using spectral envelope estimation , 2004, IEEE Transactions on Speech and Audio Processing.

[24]  Hiromasa Fujihara,et al.  Singer Identification Based on Accompaniment Sound Reduction and Reliable Frame Selection , 2005, ISMIR.

[25]  Wei-Ho Tsai,et al.  Automatic Identification of Simultaneous Singers in Duet Recordings , 2008, ISMIR.

[26]  Hiromasa Fujihara,et al.  A Modeling of Singing Voice Robust to Accompaniment Sounds and Its Application to Singer Identification and Vocal-Timbre-Similarity-Based Music Information Retrieval , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Steve Lawrence,et al.  Artist detection in music with Minnowmatch , 2001, Neural Networks for Signal Processing XI: Proceedings of the 2001 IEEE Signal Processing Society Workshop (IEEE Cat. No.01TH8584).

[28]  Daniel P. W. Ellis,et al.  USING VOICE SEGMENTS TO IMPROVE ARTIST CLASSIFICATION OF MUSIC , 2002 .