Code-Switching event detection based on delta-BIC using phonetic eigenvoice models

This paper presents a new paradigm for code-switching event detection based on delta Bayesian Information Criterion (∆BIC). First, an automatic speech recognizer (ASR) and an articulatory feature (AF) detector are constructed. The intersyllable boundaries obtained from the ASR are regarded as the potential code-switching boundaries. To estimate the language likelihood, eigenvoice models (EVMs) are employed to model the relationship between the senones/articulatory attributes and their corresponding eigenvoices constructed from the training data for different languages. The Euclidean distance and the inner product-based direction between the eigenvoice vector of the input sentence and the eigenvoice vector of a senone or an articulatory attribute in the EVMs for different languages are calculated for ∆BIC-based language likelihood estimation. Then, an n syllable Bayesian mask centered at each potential boundary is then employed to output the likelihood of language change for the potential boundary. Finally, the dynamic programming algorithm is employed to search the best language sequence given the inter-syllable boundaries from the ASR. The proposed approach was evaluated on a Chinese-English codeswitching speech database and the results show that 71.93% accuracy for code-switching event detection can be obtained.

[1]  Douglas A. Reynolds,et al.  Language identification using Gaussian mixture model tokenization , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Mei-Yuh Hwang,et al.  The SPHINX-II speech recognition system: an overview , 1993, Comput. Speech Lang..

[3]  C.-H. Lee,et al.  From knowledge-ignorant to knowledge-rich modeling : a new speech research parading for next generation automatic speech recognition , 2004 .

[4]  Lie Lu,et al.  Content-based audio segmentation using support vector machines , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[5]  Mauro Cettolo,et al.  MODEL SELECTION CRITERIA FOR ACOUSTIC SEGMENTATION , 2001 .

[6]  Helena Halmari,et al.  Government and Codeswitching: Explaining American Finnish , 1997 .

[7]  Marc A. Zissman,et al.  Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[8]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[9]  Wen-Whei Chang,et al.  Discriminative training of Gaussian mixture bigram models with application to Chinese dialect identification , 2002, Speech Commun..

[10]  Ramesh A. Gopinath,et al.  Improved speaker segmentation and segments clustering using the bayesian information criterion , 1999, EUROSPEECH.

[11]  Jia Liu,et al.  Automatic language identification using support vector machines and phonetic N-gram , 2008, 2008 International Conference on Audio, Language and Image Processing.

[12]  Chung-Hsien Wu,et al.  Story Segmentation and Topic Classification of Broadcast News via a Topic-Based Segmental Model and a Genetic Algorithm , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Douglas A. Reynolds,et al.  Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.

[14]  Tanja Schultz,et al.  LVCSR-based language identification , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[15]  J.R. Bellegarda,et al.  Exploiting latent semantic information in statistical language modeling , 2000, Proceedings of the IEEE.

[16]  Mei-Yuh Hwang,et al.  Subphonetic modeling with Markov states-Senone , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Hema A. Murthy,et al.  Language identification using parallel syllable-like unit recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  Chung-Hsien Wu,et al.  Multiple change-point audio segmentation and classification using an MDL-based Gaussian model , 2006, IEEE Trans. Speech Audio Process..

[19]  Marc A. Zissman,et al.  Automatic language identification of telephone speech messages using phoneme recognition and N-gram modeling , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  M. Halle,et al.  Preliminaries to Speech Analysis: The Distinctive Features and Their Correlates , 1961 .

[21]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[22]  Wei Li,et al.  The bilingualism reader , 2000 .

[23]  Yonghong Yan,et al.  Automatic Language Identification with Discriminative Language Characterization Based on SVM , 2008, IEICE Trans. Inf. Syst..

[24]  Chung-Hsien Wu,et al.  CECOS: A Chinese-English code-switching speech database , 2011, 2011 International Conference on Speech Database and Assessments (Oriental COCOSDA).

[25]  Chung-Hsien Wu,et al.  Transformation-based accented speech modeling using articulatory attributes for non-native speech recognition , 2011 .

[26]  Chin-Hui Lee,et al.  Toward a detector-based universal phone recognizer , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[27]  Haizhou Li,et al.  Language Identification: A Tutorial , 2011, IEEE Circuits and Systems Magazine.