Automatic Identification of the Sung Language in Popular Music Recordings

Abstract As part of the research into content-based music information retrieval (MIR), this paper presents a preliminary attempt to automatically identify the language sung in popular music recordings. It is assumed that each language has its own set of constraints that specify the sequence of basic linguistic events when lyrics are sung. Thus, the acoustic structure of individual languages may be characterized by statistically modelling those constraints. To achieve this, the proposed method employs vector clustering to convert a singing signal from its spectrum-based feature representation into a sequence of smaller basic phonological units. The dynamic characteristics of the sequence are then analysed using bigram language models. As vector clustering is performed in an unsupervised manner, the resulting system does not need sophisticated linguistic knowledge; therefore, it is easily portable to new language sets. In addition, to eliminate interference from background music, we leverage the statistical estimation of the background musical accompaniment of a song so that the vector clustering truly reflects the solo singing voices in the accompanied signals.

[1]  Christian Wellekens,et al.  DISTBIC: A speaker-based segmentation for audio data indexing , 2000, Speech Commun..

[2]  Hsin-Min Wang,et al.  Automatic singer recognition of popular music recordings via estimation and modeling of solo vocal signals , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Chua Beng Huat,et al.  Conceptualizing an East Asian popular culture , 2004 .

[4]  B. Moore,et al.  Melody recognition using three types of dichotic-pitch stimulus. , 2001, The Journal of the Acoustical Society of America.

[5]  Stefan Harbeck,et al.  Multigrams for language identification , 1999, EUROSPEECH.

[6]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7]  Jean-Luc Gauvain,et al.  Language identification incorporating lexical information , 1998, ICSLP.

[8]  Ren-Yuan Lyu,et al.  An automatic singing transcription system with multilingual singing lyric recognizer and robust melody tracker , 2003, INTERSPEECH.

[9]  Jürgen Schmidhuber,et al.  Language identification from prosody without explicit features , 1999, EUROSPEECH.

[10]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[11]  Lloyd A. Smith,et al.  Content-based indexing of musical scores , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[12]  Marc A. Zissman,et al.  Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[13]  Antti J. Eronen,et al.  Musical instrument recognition using ICA-based transform of features and discriminatively trained HMMs , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..

[14]  Michael J. Carey,et al.  Language identification using multiple knowledge sources , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[15]  Michael Picheny,et al.  Speech recognition using noise-adaptive prototypes , 1989, IEEE Trans. Acoust. Speech Signal Process..

[16]  Douglas A. Reynolds,et al.  Integrated models of signal and background with application to speaker identification in noise , 1994, IEEE Trans. Speech Audio Process..

[17]  Seiichi Nakagawa,et al.  Speaker-independent, text-independent language identification by HMM , 1992, ICSLP.

[18]  Daniel Willett,et al.  Language Identification in Vocal Music , 2006, ISMIR.

[19]  A. House,et al.  Toward automatic identification of the language of an utterance. I. Preliminary methodological con , 1977 .

[20]  Timothy J. Hazen,et al.  Segment-based automatic language identification , 1997 .

[21]  Y.K. Muthusamy,et al.  Reviewing automatic language identification , 1994, IEEE Signal Processing Magazine.

[22]  J. Cleary,et al.  \self-organized Language Modeling for Speech Recognition". In , 1997 .

[23]  Jirí Navrátil,et al.  An efficient phonotactic-acoustic system for language identification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[24]  Youngmoo E. Kim,et al.  Singer Identification in Popular Music Recordings Using Voice Coding Features , 2002 .

[25]  Ronald A. Cole,et al.  Perceptual benchmarks for automatic language identification , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  Hsin-Min Wang,et al.  Towards Automatic Identification Of Singing Language In Popular Music Recordings , 2004, ISMIR.

[27]  Jirí Navrátil,et al.  Spoken language recognition-a step toward multilinguality in speech processing , 2001, IEEE Trans. Speech Audio Process..