Lexical and acoustic modeling of non-native speech in LVSCR

As non-native speakers become more frequent users of speech recognition applications, increasing the tolerance of the system with respect to non-native pronunciation and language use is important and is currently the focus of research in a variety of contexts. Dictionary modiication, acoustic model adaptation, and acoustic model manipulation are a few of the techniques that have been reported successful in improving recognition of non-native speech. In this paper, we address the speciic case of Japanese-accented English, describing the lexical and acoustic mod-eling techniques that give the best recognizer performance. We nd that automatically generated pronunciation variants perform as well as hand-coded \golden" variants in reducing recognizer error, and that a signiicant improvement in system performance can be achieved with acoustic models retrained on a small amount of accented data.

[1]  Tilo Sloboda Dictionary learning: performance through consistency , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Horacio Franco,et al.  Automatic detection of mispronunciation for language instruction , 1997, EUROSPEECH.

[3]  Francis Kubala,et al.  Modeling Those F-Conditions - Or Not , 1997 .

[4]  Alex Waibel,et al.  Language adaptive LVCSR through Polyphone Decision Tree Specialization , 2000 .

[5]  Sanjeev Khudanpur,et al.  Is automatic speech recognition ready for non-native speech? A data collection effort and initial experiments in modelling conversational Hispanic English , 1998 .

[6]  D. A. van Leeuwen,et al.  Speech recognition of non-native speech using native and non-native acoustic models , 1999 .

[7]  Mei-Yuh Hwang,et al.  Deleted interpolation and density sharing for continuous hidden Markov models , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[8]  Philip C. Woodland,et al.  The use of accent-specific pronunciation dictionaries in acoustic model training , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[9]  Pascale Fung,et al.  Fast accent identification and accented speech recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[10]  Gerhard Rigoll,et al.  Frame-discriminative and confidence-driven adaptation for LVCSR , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[11]  James R. Glass,et al.  Lexical modeling of non-native speech for automatic speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[12]  Laura Mayfield Tomokiyo,et al.  Linguistic properties of non-native speech , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[13]  Steve J. Young,et al.  Off-line acoustic modelling of non-native accents , 1999, EUROSPEECH.

[14]  Go Kawai Spoken language processing applied to nonnative language pronunciation learning , 1999 .