Acoustic modeling using an extended phone set considering cross-lingual pronunciation variations

To deal with the issue of data unbalanced condition among a task of multilingual speech recognition and a phenomenon of pronunciation variations across languages, we propose an approach to clustering context dependent phones from an extended phone set in an acoustic model trained on a data unbalanced bilingual corpus. First, we generate an extended phone set using pronunciation modeling by a confidence measure between Mandarin and Taiwanese. Second, we use a two-step agglomerative hierarchical clustering with delta Bayesian information criteria to automatically generate a merged extended phone set (MEPS). Third, we choose a parametric modeling technique, model complexity selection, to increase the final number of Gaussian components dependent on the available training data in a data unbalanced condition. The experimental results show that the proposed automatic extending phone clustering approach reduced relative syllable error rate by 8.3% over the best result of the decision tree based phone clustering approach.

[1]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[2]  C. Mallows,et al.  A Method for Comparing Two Hierarchical Clusterings , 1983 .

[3]  Dau-Cheng Lyu,et al.  Modeling Pronunciation Variation for Bi-Lingual Mandarin/Taiwanese Speech Recognition , 2005, Int. J. Comput. Linguistics Chin. Lang. Process..

[4]  Chung-Hsien Wu,et al.  Phone Set Generation Based on Acoustic and Contextual Analysis for Multilingual Speech Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[5]  Xavier Anguera Miró,et al.  Model Complexity Selection and Cross-Validation EM Training for Robust Speaker Diarization , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[6]  Joachim Köhler Multilingual phone models for vocabulary-independent speech recognition tasks , 2001, Speech Commun..

[7]  Thomas Fang Zheng,et al.  Automatic generation of pronunciation lexicons for Mandarin spontaneous speech , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[8]  Haizhou Li,et al.  Multilingual speech recognition: a unified approach , 2005, INTERSPEECH.

[9]  Pascale Fung,et al.  Automatic phone set extension with confidence measure for spontaneous speech , 2003, INTERSPEECH.

[10]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[11]  E. Vajda Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet , 2000 .

[12]  Etienne Barnard,et al.  Phone clustering using the Bhattacharyya distance , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.