Unsupervised acoustic model adaptation for multi-origin non native ASR

To date, the performance of speech and language recognition systems is poor on non-native speech. The challenge for nonnative speech recognition is to maximize the accuracy of a speech recognition system when only a small amount of nonnative data is available. We report on the acoustic model adaptation for improving the recognition of non-native speech in English, French and Vietnamese, spoken by speakers of different origins. Using online unsupervised adaptation acoustic modeling without any additional data for adapting purposes, we investigate how an unsupervised multilingual acoustic model interpolation method can help to improve the phone accuracy of the system. Results improvement of 7% of absolute phone level accuracy (PLA) obtained from the experiments demonstrate the feasibility of the method.

[1]  Janette B. Bradley,et al.  Neural networks: A comprehensive foundation: S. HAYKIN. New York: Macmillan College (IEEE Press Book) (1994). v + 696 pp. ISBN 0-02-352761-7 , 1995 .

[2]  Maxine Eskénazi,et al.  BREF, a large vocabulary spoken corpus for French , 1991, EUROSPEECH.

[3]  Tanja Schultz,et al.  Comparison of acoustic model adaptation techniques on non-native speech , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[4]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[5]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[6]  Jean-François Serignat,et al.  Spoken and Written Language Resources for Vietnamese , 2004, LREC.

[7]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[8]  Bin Ma,et al.  A phonotactic-semantic paradigm for automatic spoken document classification , 2005, SIGIR '05.

[9]  Bin Ma,et al.  A Vector Space Modeling Approach to Spoken Language Identification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Tien Ping Tan,et al.  Modeling context and language variation for non-native speech recognition , 2007, INTERSPEECH.

[11]  Tanja Schultz,et al.  Multilingual Speech Processing , 2006 .

[12]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[13]  J. Flege,et al.  Amount of native-language (L1) use affects the pronunciation of an L2 , 1997 .