Adaptation in the pronunciation space for non-native speech recognition

We introduce a new technique to improve the recognition of non-native speech. The underlying assumption is that for each non-native pronunciation of a speech sound, there is at least one sound in the target language that has a similar native pronunciation. The adaptation is performed by HMM interpolation between adequate native acoustic models. The interpolation partners are determined automatically in a data-driven manner. Our experiments show that this technique is suitable for both the offline adaptation to a whole group of speakers as well as for the unsupervised online adaptation to a single speaker. Results are given both for spontaneous non-native English speech as well as for a set of read non-native German utterances.

[1]  Elmar Nöth,et al.  The Utility of Semantic-Pragmatic Information and Dialogue-State for Speech Recognition in Spoken Dialogue Systems , 2000, TSD.

[2]  Ian Maddieson,et al.  Patterns of sounds , 1986 .

[3]  James R. Glass,et al.  Lexical modeling of non-native speech for automatic speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[4]  Dirk V. Campernolle Speech Recognition by Goats, Wolves, Sheep and Non-Natives , 2000 .

[5]  Isabel Trancoso,et al.  Recognition of non-native accents , 1997, EUROSPEECH.

[6]  Laura Mayfield Tomokiyo,et al.  Recognizing Non-Native Speech: Characterizing and Adapting to Non-Native Usage in LVCSR , 2001 .

[7]  Georg Stemmer Modeling variability in speech recognition , 2004 .

[8]  Elmar Nöth,et al.  Improving Children's Speech Recognition by HMM Interpolation with an Adults' Speech Recognizer , 2003, DAGM-Symposium.

[9]  R. Schwartz,et al.  Maximum a posteriori adaptation for large scale HMM recognizers , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[10]  Helmer Strik,et al.  Modeling pronunciation variation for ASR: A survey of the literature , 1999, Speech Commun..

[11]  J. Bellegarda An Overview of Statistical Language Model Adaptation , 2001 .

[12]  James R. Glass,et al.  Telephone-based conversational speech recognition in the JUPITER domain , 1998, ICSLP.