Recognition of multilingual speech in mobile applications

We evaluate different architectures to recognize multilingual speech for real-time mobile applications. In particular, we show that combining the results of several recognizers greatly outperforms other solutions such as training a single large multilingual system or using an explicit language identification system to select the appropriate recognizer. Experiments are conducted on a trilingual English-French-Mandarin mobile speech task. The data set includes Google searches, Maps queries, as well as more general inputs such as email and short message dictation. Without pre-specifying the input language, the combined system achieves comparable accuracy to that of the monolingual systems when the input language is known. The combined system is also roughly 5% absolute better than an explicit language identification approach, and 10% better than a single large multilingual system.

[1]  Mark J. F. Gales,et al.  Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[2]  Tanja Schultz,et al.  Language-independent and language-adaptive acoustic modeling for speech recognition , 2001, Speech Commun..

[3]  Tanja Schultz,et al.  Language independent and language adaptive large vocabulary speech recognition , 1998, ICSLP.

[4]  Ngoc Thang Vu,et al.  Rapid Building of an ASR System for Under-Resourced Languages Based on Multilingual Unsupervised Training , 2011, INTERSPEECH.

[5]  Joachim Köhler Language adaptation of multilingual phone models for vocabulary independent speech recognition tasks , 1998, ICASSP.

[6]  Brian Kingsbury,et al.  Boosted MMI for model and feature-space discriminative training , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  G. Richard Tucker,et al.  Bilingual education in the 21st century: a global perspective , 2011 .

[8]  Jiulong Shan,et al.  Search by voice in Mandarin Chinese , 2010, INTERSPEECH.

[9]  Hung-An Chang,et al.  Recognizing English queries in Mandarin Voice Search , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  J. Kohler Language adaptation of multilingual phone models for vocabulary independent speech recognition tasks , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[11]  William M. Campbell,et al.  Support vector machines for speaker and language recognition , 2006, Comput. Speech Lang..

[12]  G. Richard Tucker,et al.  A Global Perspective on Bilingualism and Bilingual Education. ERIC Digest. , 1999 .

[13]  Michiel Bacchiani,et al.  Discriminative Features for Language Identification , 2011, INTERSPEECH.