On using units trained on foreign data for improved multiple accent speech recognition

Foreign accented speech recognition systems have to deal with the acoustic realization of sounds produced by non-native speakers that does not always match with native speech models. As the standard native speech modeling alone is generally not adequate, it is usually extended with models of phonemes estimated from speech data of foreign languages, and often complemented with extra pronunciation variants. In this paper, the focus is set on the speech recognition of multiple non-native accents. The speech corpus used was recorded from speakers originated from 24 different countries. The introduction of models of phonemes of the target language adapted on foreign speech data is presented and detailed. For the recognition of non-native speech comprising multiple foreign accents, this approach provides better performance than the introduction of standard foreign units. The selection of the most frequent acoustic variants for each phoneme is also discussed as this method makes recognition results more homogenous across speaker language groups. Furthermore, the adaptation of the acoustic models on non-native speech data is studied. Results show that detailed models, which include the modeling of extra pronunciation variants through acoustic units estimated on foreign data, benefit more from the task and accent adaptation process than baseline standard models used for native speech recognition. In addition, experiments show that an adaptation of the acoustic models on a limited set of foreign accents provides speech recognition performance improvements even on foreign accents absent from the adaptation data.

[1]  Ralf Kompe,et al.  Generating non-native pronunciation variants for lexicon adaptation , 2004, Speech Commun..

[2]  Harald Höge,et al.  Foreign-accented speaker-independent speech recognition , 2004, INTERSPEECH.

[3]  Lori Lamel,et al.  Pronunciation Variants Across Systems, Languages and Speaking Style , 2007 .

[4]  Chafic Mokbel,et al.  Incremental enrolment of speech recognizers , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[5]  Christoph Draxler,et al.  Identification of regional variants of high German from digit sequences in German telephone speech , 1997, EUROSPEECH.

[6]  Xiaofan Lin,et al.  Phoneme-less hierarchical accent classification , 2004, Conference Record of the Thirty-Eighth Asilomar Conference on Signals, Systems and Computers, 2004..

[7]  Aaron D. Lawson,et al.  Effect of foreign accent on speech recognition in the NATO n-4 corpus , 2003, INTERSPEECH.

[8]  R. W. King,et al.  Automatic accent classification of foreign accented Australian English speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[9]  Katarina Bartkova,et al.  Language based phone model combination for ASR adaptation to foreign accent , 1999 .

[10]  Antoine Raux,et al.  Automated lexical adaptation and speaker clustering based on pronunciation habits for non-native speech recognition , 2004, INTERSPEECH.

[11]  Helmer Strik,et al.  Modeling pronunciation variation for ASR: A survey of the literature , 1999, Speech Commun..

[12]  James Emil Flege,et al.  Interaction between the native and second language phonetic subsystems , 2003, Speech Commun..

[13]  George Zavaliagkos,et al.  Comparative Experiments on Large Vocabulary Speech Recognition , 1993, HLT.

[14]  Pascale Fung,et al.  MLLR-based accent model adaptation without accented data , 2000, INTERSPEECH.

[15]  Isabel Trancoso,et al.  Accent identification , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[16]  Katarina Bartkova,et al.  Using Multilingual Units for Improved Modeling of Pronunciation Variants , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[17]  Steve J. Young,et al.  Off-line acoustic modelling of non-native accents , 1999, EUROSPEECH.

[18]  John H. L. Hansen,et al.  Language accent classification in American English , 1996, Speech Commun..

[19]  Xiuyang Yu,et al.  What kind of pronunciation variation is hard for triphones to model? , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[20]  Katarina Bartkova,et al.  Multiple models for improved speech recognition for non-native speakers , 2004 .

[21]  Satoshi Nakamura,et al.  Speech recognition for multiple non-native accent groups with speaker-group-dependent acoustic models , 2004, INTERSPEECH.

[22]  Katarina Bartkova,et al.  On the modelization of allophones in an HMM based speech recognition system , 1991, EUROSPEECH.

[23]  Tao Chen,et al.  Analysis of Speaker Variability , 2022 .

[24]  Yunxin Zhao,et al.  Fast model selection based speaker adaptation for nonnative speech , 2003, IEEE Trans. Speech Audio Process..

[25]  Philip C. Woodland,et al.  Using accent-specific pronunciation modelling for robust speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[26]  James R. Glass,et al.  Lexical modeling of non-native speech for automatic speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[27]  Chafic Mokbel,et al.  Deconvolution of telephone line effects for speech recognition , 1996, Speech Commun..

[28]  Pascale Fung,et al.  Fast accent identification and accented speech recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[29]  Manuela Boros,et al.  Recognition of non-native German speech with multilingual recognizers , 1999, EUROSPEECH.

[30]  Dirk Van Compernolle Recognizing speech of goats, wolves, sheep and ... non-natives , 2001, Speech Commun..