Multilingual recognition of non-native speech using acoustic model transformation and pronunciation modeling

This article presents an approach for the automatic recognition of non-native speech. Some non-native speakers tend to pronounce phonemes as they would in their native language. Model adaptation can improve the recognition rate for non-native speakers, but has difficulties dealing with pronunciation errors like phoneme insertions or substitutions. For these pronunciation mismatches, pronunciation modeling can make the recognition system more robust. Our approach is based on acoustic model transformation and pronunciation modeling for multiple non-native accents. For acoustic model transformation, two approaches are evaluated: MAP and model re-estimation. For pronunciation modeling, confusion rules (alternate pronunciations) are automatically extracted from a small non-native speech corpus. This paper presents a novel approach to introduce confusion rules in the recognition system which are automatically learned through pronunciation modelling. The modified HMM of a foreign spoken language phoneme includes its canonical pronunciation along with all the alternate non-native pronunciations, so that spoken language phonemes pronounced correctly by a non-native speaker could be recognized. We evaluate our approaches on the European project HIWIRE non-native corpus which contains English sentences pronounced by French, Italian, Greek and Spanish speakers. Two cases are studied: the native language of the test speaker is either known or unknown. Our approach gives better recognition results than the classical acoustic adaptation of HMM when the foreign origin of the speaker is known. We obtain 22% WER reduction compared to the reference system.Furthermore, we take into account the written form of the spoken words: non-native speakers may rely on the writing of the words in order to pronounce them. This approach does not provide any further improvements.

[1]  Antoine Raux,et al.  Automated lexical adaptation and speaker clustering based on pronunciation habits for non-native speech recognition , 2004, INTERSPEECH.

[2]  Dirk Van Compernolle Recognizing speech of goats, wolves, sheep and ... non-natives , 2001, Speech Commun..

[3]  Hermann Ney,et al.  Multigram-based grapheme-to-phoneme conversion for LVCSR , 2003, INTERSPEECH.

[4]  Xiuyang Yu,et al.  What kind of pronunciation variation is hard for triphones to model? , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[5]  Jean-Pierre Martens,et al.  Automatic rule-based generation of word pronunciation networks , 1997, EUROSPEECH.

[6]  Irina Illina,et al.  Combined acoustic and pronunciation modelling for non-native speech recognition , 2007, INTERSPEECH.

[7]  Jean Paul Haton,et al.  Fully Automated Non-Native Speech Recognition Using Confusion-Based Acoustic Model Integration and Graphemic Constraints , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[8]  Katarina Bartkova,et al.  On using units trained on foreign data for improved multiple accent speech recognition , 2007, Speech Commun..

[9]  Stephen Cox,et al.  Some statistical issues in the comparison of speech recognition algorithms , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[10]  Alex Waibel,et al.  Adaptation Methods For Non-Native Speech , 2001 .

[11]  Harriet J. Nock,et al.  Pronunciation modeling by sharing gaussian densities across phonetic models , 1999, EUROSPEECH.

[12]  James Emil Flege,et al.  Interaction between the native and second language phonetic subsystems , 2003, Speech Commun..

[13]  Hong Kook Kim,et al.  Acoustic Model Adaptation Based on Pronunciation Variability Analysis for Non-Native Speech Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[14]  Bert Van Coile Inductive learning of grapheme-to-phoneme rules , 1990, ICSLP.

[15]  Daniel Jurafsky,et al.  Limitations of MLLR adaptation with Spanish-accented English: an error analysis , 2006, INTERSPEECH.

[16]  Keikichi Hirose,et al.  Improvement of non-native speech recognition by effectively modeling frequently observed pronunciation habits , 2003, INTERSPEECH.

[17]  Jean-Pierre Martens,et al.  Recognition of foreign names spoken by native speakers , 2007, INTERSPEECH.

[18]  I. Lehiste,et al.  Principles and Methods for Historical Linguistics , 1979 .

[19]  Stefan Schaden Generating Non-Native Pronu Phonological R , 2003 .

[20]  Anne H. Anderson,et al.  Proceedings of Eurospeech , 2003, ISCA 2003.

[21]  Katarina Bartkova,et al.  Using Multilingual Units for Improved Modeling of Pronunciation Variants , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[22]  John J. Morgan,et al.  Making a Speech Recognizer Tolerate Non-native Speech through Gaussian Mixture Merging , 2004 .

[23]  Jean-Pierre Martens,et al.  In search of better pronunciation models for speech recognition , 1999, Speech Commun..

[24]  James R. Glass,et al.  Lexical modeling of non-native speech for automatic speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[25]  Yunxin Zhao,et al.  Fast model selection based speaker adaptation for nonnative speech , 2003, IEEE Trans. Speech Audio Process..

[26]  Ralf Kompe,et al.  Generating non-native pronunciation variants for lexicon adaptation , 2004, Speech Commun..

[27]  P. Ladefoged,et al.  The sounds of the world's languages , 1996 .

[28]  Aaron D. Lawson,et al.  Effect of foreign accent on speech recognition in the NATO n-4 corpus , 2003, INTERSPEECH.