Speech recognition for multiple non-native accent groups with speaker-group-dependent acoustic models

In this paper, the recognition performance for non-native English speech with two different kinds of speaker-groupdependent acoustic models is investigated. The approaches for creating speaker groups include knowledge-based grouping of non-native speakers by their first language, and the automatic clustering of speakers. Clustering is based on speakerdependent acoustic models in speaker Eigenspace. The acoustic model for each speaker group is obtained by bootstrapping with pre-segmented speech data or adaptation of a speakerindependent native baseline model. For the decoding of a nonnative speaker’s utterance not seen during the training or adaptation phase, the selection of a model suitable to cope with the accent characteristics of that speaker is necessary. Here, ideal selection via an oracle and parallel decoding are examined. Evaluation is conducted in a hotel reservation task for five major accent groups, including German, French, Indonesian, Chinese and Japanese speakers. Recognition results with speakerdependent and an accent-independent non-native model will also be reported.

[1]  Steve J. Young,et al.  Off-line acoustic modelling of non-native accents , 1999, EUROSPEECH.

[2]  James Emil Flege,et al.  Interaction between the native and second language phonetic subsystems , 2003, Speech Commun..

[3]  Laura Mayfield Tomokiyo,et al.  Lexical and acoustic modeling of non-native speech in LVSCR , 2000, INTERSPEECH.

[4]  G. Ruske,et al.  Robust speaker clustering in eigenspace , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[5]  Tanja Schultz,et al.  Comparison of acoustic model adaptation techniques on non-native speech , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[6]  Caroline L. Smith Handbook of the International Phonetic Association: a guide to the use of the International Phonetic Alphabet (1999). Cambridge: Cambridge University Press. Pp. ix+204. , 2000, Phonology.

[7]  Siegfried Kunzmann,et al.  Recent progress in the decoding of non-native speech with multilingual acoustic models , 2003, INTERSPEECH.

[8]  Atsunori Ogawa,et al.  Non-native English speech recognition using bilingual English lexicon and acoustic models , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[9]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[10]  A. Waibel,et al.  Multilinguality in speech and spoken language systems , 2000, Proceedings of the IEEE.

[11]  Alex Waibel,et al.  Adaptation Methods For Non-Native Speech , 2001 .

[12]  Isabel Trancoso,et al.  Accent identification , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[13]  David G. Stork,et al.  Pattern Classification , 1973 .