Acoustic Model Interpolation for Non-Native Speech Recognition

This paper proposes three interpolation techniques which use the target language and the speaker's native language to improve non-native speech recognition system. These interpolation techniques are manual interpolation, weighted least square and eigenvoices. Each of them can be used under different situation and constraints. In contrast to weighted least square and eigenvoices methods, manual interpolation can be achieved offline without any adaptation data. These methods can also be combined with MLLR to improve the recognition rate. Experiments presented in this paper show that the best non native adaptation method, combined with MLLR can give 10% WER absolute reduction on a French automatic speech recognition system for both Chinese and Vietnamese native speakers.

[1]  Robert I. Westwood,et al.  Speaker Adaptation Using Eigenvoices , 1999 .

[2]  Tien Ping Tan,et al.  A French Non-Native Corpus for Automatic Speech Recognition , 2006, LREC.

[3]  Roland Kuhn,et al.  Rapid speaker adaptation in eigenvoice space , 2000, IEEE Trans. Speech Audio Process..

[4]  Roland Kuhn,et al.  Eigenvoices for speaker adaptation , 1998, ICSLP.

[5]  Jean-François Serignat,et al.  Spoken and Written Language Resources for Vietnamese , 2004, LREC.

[6]  J. Flege The production of "new" and "similar" phones in a foreign language: evidence for the effect of equivalence classification , 1987 .

[7]  Maxine Eskénazi,et al.  BREF, a large vocabulary spoken corpus for French , 1991, EUROSPEECH.

[8]  G. Clark,et al.  Reference , 2008 .

[9]  L. R. Rabiner,et al.  A probabilistic distance measure for hidden Markov models , 1985, AT&T Technical Journal.

[10]  Silke Goronzy,et al.  Robust Adaptation to Non-Native Accents in Automatic Speech Recognition , 2002, Lecture Notes in Computer Science.

[11]  J. Brian Gray,et al.  Introduction to Linear Regression Analysis , 2002, Technometrics.

[12]  Silke M. Witt,et al.  Use of speech recognition in computer-assisted language learning , 2000 .

[13]  Philip C. Woodland,et al.  The use of accent-specific pronunciation dictionaries in acoustic model training , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[14]  Tanja Schultz,et al.  Non-native spontaneous speech recognition through polyphone decision tree specialization , 2003, INTERSPEECH.