Non-Native Pronunciation Variation Modeling for Automatic Speech Recognition

Communication using speech is inherently natural, with this ability of communication unconsciously acquired in a step-by-step manner throughout life. In order to explore the benefits of speech communication in devices, there have been many research works performed over the past several decades. As a result, automatic speech recognition (ASR) systems have been deployed in a range of applications, including automatic reservation systems, dictation systems, navigation systems, etc. Due to increasing globalization, the need for effective interlingual communication has also been growing. However, because of the fact that most people tend to speak foreign languages with variant or influent pronunciations, this has led to an increasing demand for the development of non-native ASR systems (Goronzy et al., 2001). In other words, a conventional ASR system is optimized with native speech; however, non-native speech has different characteristics from native speech. That is, non-native speech tends to reflect the pronunciations or syntactic characteristics of the mother tongue of the non-native speakers, as well as the wide range of fluencies among non-native speakers. Therefore, the performance of an ASR system evaluated using non-native speech tends to severely degrade when compared to that of native speech due to the mismatch between the native training data and the nonnative test data (Compernolle, 2001). A simple way to improve the performance of an ASR system for non-native speech would be to train the ASR system using a non-native speech database, though in reality the number of non-native speech samples available for this task is not currently sufficient to train an ASR system. Thus, techniques for improving non-native ASR performance using only small amount of non-native speech are required. There have been three major approaches for handling non-native speech for ASR: acoustic modeling, language modeling, and pronunciation modeling approaches. First, acoustic modeling approaches find pronunciation differences and transform and/or adapt acoustic models to include the effects of non-native speech (Gruhn et al., 2004; Morgan, 2004; Steidl et al., 2004). Second, language modeling approaches deal with the grammatical effects or speaking style of non-native speech (Bellegarda, 2001). Third, pronunciation modeling approaches derive pronunciation variant rules from non-native speech and apply the derived rules to pronunciation models for non-native speech (Amdal et al., 2000; FoslerLussier, 1999; Goronzy et al., 2004; Gruhn et al., 2004; Raux, 2004; Strik et al., 1999). Source: Advances in Speech Recognition, Book edited by: Noam R. Shabtai, ISBN 978-953-307-097-1, pp. 164, September 2010, Sciyo, Croatia, downloaded from SCIYO.COM

[1]  Dirk Van Compernolle Recognizing speech of goats, wolves, sheep and ... non-natives , 2001, Speech Commun..

[2]  Elmar Nöth,et al.  Adaptation in the pronunciation space for non-native speech recognition , 2004, INTERSPEECH.

[3]  Atsunori Ogawa,et al.  Non-native English speech recognition using bilingual English lexicon and acoustic models , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[4]  Hong Kook Kim,et al.  Optimizing multiple pronunciation dictionary based on a confusability measure for non-native speech recognition , 2008 .

[5]  Thomas Schaaf,et al.  Dictionary refinements based on phonetic consensus and non-uniform pronunciation reduction , 2004, INTERSPEECH.

[6]  Muhammad Ghulam,et al.  Study on pharyngeal and uvular consonants in foreign accented Arabic for ASR , 2010, Comput. Speech Lang..

[7]  Hong Kook Kim,et al.  MLLR/MAP adaptation using pronunciation variation for non-native speech recognition , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[8]  Filipp Korkmazskiy,et al.  Joint pronunciation modelling of non-native speakers using data-driven methods , 2000, INTERSPEECH.

[9]  T. Svendsen Pronunciation modeling for speech technology , 2004, 2004 International Conference on Signal Processing and Communications, 2004. SPCOM '04..

[10]  Ralf Kompe,et al.  Generating non-native pronunciation variants for lexicon adaptation , 2004, Speech Commun..

[11]  Daniel Jurafsky,et al.  Building multiple pronunciation models for novel words using exploratory computational phonology , 1995, EUROSPEECH.

[12]  Tien Ping Tan,et al.  Acoustic Model Interpolation for Non-Native Speech Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[13]  Eric Fosler-Lussier,et al.  Multi-level decision trees for static and dynamic pronunciation models , 1999, EUROSPEECH.

[14]  L. Nygaard,et al.  Perceptual learning of systematic variation in Spanish-accented speech. , 2009, The Journal of the Acoustical Society of America.

[15]  Helmer Strik,et al.  Modeling pronunciation variation for ASR: A survey of the literature , 1999, Speech Commun..

[16]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[17]  Thomas Fang Zheng,et al.  State-dependent phoneme-based model merging for dialectal Chinese speech recognition , 2006, Speech Commun..

[18]  Young-Ju Lee,et al.  Design and Construction of Korean-Spoken English Corpus ( K-SEC ) * 1 , 2004 .

[19]  Hong Kook Kim,et al.  On the use of feature-space MLLR adaptation for non-native speech recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Hong Wei,et al.  Acoustic models adaptation in large vocabulary continuous Mandarin speech recognition for non-native speakers , 2004, Proceedings 7th International Conference on Signal Processing, 2004. Proceedings. ICSP '04. 2004..

[21]  Tanja Schultz,et al.  Comparison of acoustic model adaptation techniques on non-native speech , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[22]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[23]  Katarina Bartkova,et al.  Using Multilingual Units for Improved Modeling of Pronunciation Variants , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[24]  Jean Paul Haton,et al.  Fully Automated Non-Native Speech Recognition Using Confusion-Based Acoustic Model Integration and Graphemic Constraints , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[25]  Arun C. Surendran,et al.  DATA-DRIVEN PRONUNCIATION MODELLING FOR NON-NATIVE SPEAKERS USING ASSOCIATION STRENGTH BETWEEN PHONES , 2005 .

[26]  Marina Sahakyan,et al.  Is non-native pronunciation modelling necessary ? , 2001, INTERSPEECH.

[27]  Stefan Schaden Generating Non-Native Pronu Phonological R , 2003 .

[28]  Antoine Raux,et al.  Using Task-Oriented Spoken Dialogue Systems for Language Learning: Potential, Practical Applications and Challenges , 2004 .

[29]  John J. Morgan,et al.  Making a Speech Recognizer Tolerate Non-native Speech through Gaussian Mixture Merging , 2004 .

[30]  Elmar Nöth,et al.  Non-native speech databases , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[31]  Antoine Raux,et al.  Automated lexical adaptation and speaker clustering based on pronunciation habits for non-native speech recognition , 2004, INTERSPEECH.

[32]  Hong Kook Kim,et al.  Acoustic and pronunciation model adaptation for context-independent and context-dependent pronunciation variability of non-native speech , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  Hong Kook Kim,et al.  Non-native pronunciation variation modeling using an indirect data driven method , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[34]  Chung-Hsien Wu,et al.  Unsupervised pronunciation grammar growing using knowledge-based and data-driven approaches , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[35]  Lin-Shan Lee,et al.  IMPROVED PRONUNCIATION MODELING BY PROPERLY INTEGRATING BETTER APPROACHES FOR BASEFORM GENERATION , RANKING AND PRUNING , 2000 .

[36]  M.K.C. MacMahon International Phonetic Association , 2006 .

[37]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[38]  Yunxin Zhao,et al.  Fast model selection based speaker adaptation for nonnative speech , 2003, IEEE Trans. Speech Audio Process..

[39]  Richard Wiseman,et al.  Dynamic and static improvements to lexical baseforms , 1997, EUROSPEECH.

[40]  Allard Jongman,et al.  Effects of Acoustic Variability in the Perceptual Learning of Non-Native-Accented Speech Sounds , 2007, Phonetica.

[41]  Hong Kook Kim,et al.  Acoustic Model Adaptation Based on Pronunciation Variability Analysis for Non-Native Speech Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[42]  S. J. Young,et al.  Tree-based state tying for high accuracy acoustic modelling , 1994 .

[43]  Rüdiger Hoffmann,et al.  Automatic Learning and Optimization of Pronunciation Dictionaries , 2001 .

[44]  J. Bellegarda An Overview of Statistical Language Model Adaptation , 2001 .

[45]  Satoshi Nakamura,et al.  A statistical lexicon for non-native speech recognition , 2004, INTERSPEECH.

[46]  Irina Illina,et al.  Combined acoustic and pronunciation modelling for non-native speech recognition , 2007, INTERSPEECH.