Improving ASR performance on non-native speech using multilingual and crosslingual information

This paper presents our latest investigation of automatic speech recognition (ASR) on non-native speech. We first report on a non-native speech corpus an extension of the GlobalPhone database which contains English with Bulgarian, Chinese, German and Indian accent and German with Chinese accent. In this case, English is the spoken language (L2) and Bulgarian, Chinese, German and Indian are the mother tongues (L1) of the speakers. Afterwards, we investigate the effect of multilingual acoustic modeling on non-native speech. Our results reveal that a bilingual L1-L2 acoustic model significantly improves the ASR performance on non-native speech. For the case that L1 is unknown or L1 data is not available, a multilingual ASR system trained without L1 speech data consistently outperforms the monolingual L2 ASR system. Finally, we propose a method called crosslingual accent adaptation, which allows using English with Chinese accent to improve the German ASR on German with Chinese accent and vice versa. Without using any intra lingual adaptation data, we achieve 15.8% relative improvement in average over the baseline system.

[1]  Tanja Schultz,et al.  Comparison of acoustic model adaptation techniques on non-native speech , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[2]  Ngoc Thang Vu,et al.  Multilingual bottle-neck features and its application for under-resourced languages , 2012, SLTU.

[3]  J. Flege PHONETIC APPROXIMATION IN SECOND LANGUAGE ACQUISITION1 , 1980 .

[4]  J. Flege,et al.  Amount of native-language (L1) use affects the pronunciation of an L2 , 1997 .

[5]  Karen Livescu Analysis and modeling of non-native speech for automatic speech recognition , 1999 .

[6]  Ngoc Thang Vu,et al.  GlobalPhone: A multilingual text & speech database in 20 languages , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Ngoc Thang Vu,et al.  Lexical and Acoustic Adaptation for Multiple Non-Native English Accents , 2011 .

[8]  Ngoc Thang Vu,et al.  Rapid bootstrapping of five eastern european languages using the rapid language adaptation toolkit , 2010, INTERSPEECH.

[9]  Ngoc Thang Vu,et al.  An Investigation on Initialization Schemes for Multilayer Perceptron Training Using Multilingual Dat , 2012 .

[10]  J. Flege The production of "new" and "similar" phones in a foreign language: evidence for the effect of equivalence classification , 1987 .

[11]  Tien Ping Tan,et al.  Acoustic Model Interpolation for Non-Native Speech Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[12]  Tanja Schultz,et al.  Language-independent and language-adaptive acoustic modeling for speech recognition , 2001, Speech Commun..

[13]  Elmar Nöth,et al.  Multilingual Weighted Codebooks for Non-native Speech Recognition , 2008, TSD.

[14]  S Kullback,et al.  LETTER TO THE EDITOR: THE KULLBACK-LEIBLER DISTANCE , 1987 .

[15]  Alex Waibel,et al.  Adaptation Methods For Non-Native Speech , 2001 .

[16]  Jean Paul Haton,et al.  Multilingual non-native speech recognition using phonetic confusion-based acoustic model modification and graphemic constraints , 2006, INTERSPEECH.