Singing voice detection for karaoke application

We present a framework to detect the regions of singing voice in musical audio signals. This work is oriented towards the development of a robust transcriber of lyrics for karaoke applications. The technique leverages on a combination of low-level audio features and higher level musical knowledge of rhythm and tonality. Musical knowledge of the key is used to create a song-specific filterbank to attenuate the presence of the pitched musical instruments. This is followed by subband processing of the audio to detect the musical octaves in which the vocals are present. Text processing is employed to approximate the duration of the sung passages using freely available lyrics. This is used to obtain a dynamic threshold for vocal/ non-vocal segmentation. This pairing of audio and text processing helps create a more accurate system. Experimental evaluation on a small database of popular songs shows the validity of the proposed approach. Holistic and per-component evaluation of the system is conducted and various improvements are discussed.

[1]  Changsheng Xu,et al.  Singer identification based on vocal and instrumental models , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[2]  John Backus,et al.  The Acoustical Foundations of Music , 1970 .

[3]  Marcy R. Chvasta Karaoke Nights: An Ethnographic Rhapsody , 2003 .

[4]  Mohan S. Kankanhalli,et al.  Content-based music structure analysis with applications to music semantics understanding , 2004, MULTIMEDIA '04.

[5]  Yoichi Muraoka,et al.  Real-time beat tracking for drumless audio signals: Chord change detection for musical decisions , 1999, Speech Commun..

[6]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[7]  Changsheng Xu,et al.  Semantic Region Detection in Acoustic Music Signals , 2004, PCM.

[8]  Hsin-Min Wang,et al.  Automatic detection and tracking of target singer in multi-singer music recordings , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Daniel P. W. Ellis,et al.  Locating singing voice segments within music signals , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[10]  Ye Wang,et al.  Key determination of acoustic musical signals , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[11]  E. Prame Vibrato extent and intonation in professional Western lyric singing , 1997 .

[12]  Changsheng Xu,et al.  Singing voice detection using twice-iterated composite Fourier transform , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[13]  Ye Wang,et al.  Singing voice detection in popular music , 2004, MULTIMEDIA '04.

[14]  P. Desain,et al.  VIBRATO : QUESTIONS AND ANSWERS FROM MUSICIANS AND SCIENCE , 2000 .

[15]  J. Sundberg The perception of singing. , 1999 .

[16]  T. Zhang System and Method for Automatic Singer Identification , 2003 .

[17]  Beth Logan,et al.  Semantic analysis of song lyrics , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[18]  Steven M. Kay,et al.  Cochannel speaker separation by harmonic enhancement and suppression , 1997, IEEE Trans. Speech Audio Process..

[19]  George Tzanetakis,et al.  Song-specific bootstrapping of singing voice structure , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[20]  Masashi Unoki,et al.  Extraction of F0 dynamic characteristics and development of F0 control model in singing voice , 2002 .

[21]  Lie Lu,et al.  Music type classification by spectral contrast feature , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[22]  Robert O. Gjerdingen,et al.  The psychology of music , 2002 .

[23]  Daniel P. W. Ellis,et al.  USING VOICE SEGMENTS TO IMPROVE ARTIST CLASSIFICATION OF MUSIC , 2002 .

[24]  Liang Gu,et al.  Robust singing detection in speech/music discriminator design , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[25]  Ye Wang,et al.  Automatic Detection Of Vocal Segments In Popular Songs , 2004, ISMIR.

[26]  Ye Wang,et al.  LyricAlly: automatic synchronization of acoustic musical signals and textual lyrics , 2004, MULTIMEDIA '04.

[27]  Yoichi Muraoka,et al.  A beat tracking system for acoustic signals of music , 1994, MULTIMEDIA '94.

[28]  Mark A. Bartsch,et al.  Automatic singer identification in polyphonic music. , 2004 .

[29]  Youngmoo E. Kim,et al.  Singer Identification in Popular Music Recordings Using Voice Coding Features , 2002 .

[30]  Hsin-Min Wang,et al.  Blind Clustering of Popular Music Recordings Based on Singer Voice Characteristics , 2004, Computer Music Journal.