Chinese-English bilingual phone modeling for cross-language speech recognition

In this paper, three different approaches to Chinese-English bilingual phone modeling are investigated and compared. The first approach is to simply combine Chinese and English phone inventories together without phone sharing across the languages. The second one is to map language-dependent phones to the inventory of the International Phonetic Association (IPA) based on phonetic knowledge to construct the bilingual phone inventory. The third one is to merge the language-dependent phone models by an hierarchical phone clustering algorithm to get a compact bilingual inventory. In the third approach, two distance measures are used to perform the bottom-up clustering. One is the Bhattacharyya distance. The other is the acoustic likelihood distance. Experimental results show that the phone clustering approach outperforms the IPA-based phone mapping approach, and it can also achieve comparable performance to the simple combination of language-dependent phone inventories with fewer model parameters, especially when using acoustic likelihood distance measurement.

[1]  Tanja Schultz,et al.  Language-independent and language-adaptive acoustic modeling for speech recognition , 2001, Speech Commun..

[2]  Martine Adda-Decker Towards multilingual interoperability in automatic speech recognition , 2001, Speech Commun..

[3]  Bo Xu,et al.  Chinese-English bilingual speech recognition , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[4]  Lei Jia,et al.  Including detailed information feature in MFCC for large vocabulary contious speech recornition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Steve J. Young,et al.  The use of state tying in continuous speech recognition , 1993, EUROSPEECH.

[6]  Tanja Schultz,et al.  Fast bootstrapping of LVCSR systems with multilingual phoneme sets , 1997, EUROSPEECH.

[7]  Andreas Stolcke,et al.  A study of multilingual speech recognition , 1997, EUROSPEECH.

[8]  Etienne Barnard,et al.  Phone clustering using the Bhattacharyya distance , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[9]  Joachim Köhler Multilingual phone models for vocabulary-independent speech recognition tasks , 2001, Speech Commun..

[10]  L. R. Rabiner,et al.  A probabilistic distance measure for hidden Markov models , 1985, AT&T Technical Journal.

[11]  Ulla Uebler,et al.  Multilingual speech recognition in seven languages , 2001, Speech Commun..