Adaptation to non-native speech using evolutionary-based discriminative linear transforms

In this paper we are concerned with the problem of the adaptation of non-native speech in a large-vocabulary speech recognition system for Modern Standard Arabic (MSA). A technique to adapt Hidden Markov Models (HMMs) to foreign accents by using Genetic Algorithms (GAs) in unsupervised mode is presented. The implementation requirements of GAs, such as genetic operators and objective function, have been selected to give more reliability to a global linear transformation matrix. The Minimum Phone Error (MPE) criterion is used as an objective function. The West Point Language Data Consortium (LDC) modern standard Arabic database is used throughout our experiments. Results show that significant decrease of word error rate has been achieved by the evolutionary-based approach compared to conventional Maximum Likelihood Linear Regression (MLLR), Maximum a posteriori (MAP) techniques and to the adaptation combining MLLR and MPE-based training.

[1]  Sanjeev Khudanpur,et al.  Is automatic speech recognition ready for non-native speech? A data collection effort and initial experiments in modelling conversational Hispanic English , 1998 .

[2]  P. Woodland,et al.  Discriminative linear transforms for speaker adaptation , 2001 .

[3]  Lan Wang,et al.  MPE-based discriminative linear transforms for speaker adaptation , 2008, Comput. Speech Lang..

[4]  Daniel Povey,et al.  Large scale discriminative training of hidden Markov models for speech recognition , 2002, Comput. Speech Lang..

[5]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[6]  Sid-Ahmed Selouani Speech Processing and Soft Computing , 2011, Springer Briefs in Electrical and Computer Engineering.

[7]  Douglas D. O'Shaughnessy,et al.  Experiments on Automatic Recognition of Nonnative Arabic Speech , 2008, EURASIP J. Audio Speech Music. Process..

[8]  Karen Livescu Analysis and modeling of non-native speech for automatic speech recognition , 1999 .

[9]  Jyh-Shing Roger Jang,et al.  Minimum phone error discriminative training for Mandarin Chinese speaker adaptation , 2008, INTERSPEECH.

[10]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[11]  Irina Illina,et al.  Using genetic algorithms for rapid speaker adaptation , 2003, INTERSPEECH.

[12]  Lan Wang,et al.  MPE-based discriminative linear transform for speaker adaptation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Katarina Bartkova,et al.  Multiple models for improved speech recognition for non-native speakers , 2004 .

[14]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[15]  Jean-Luc Gauvain,et al.  Speaker adaptation based on MAP estimation of HMM parameters , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[17]  Douglas D. O'Shaughnessy,et al.  Speaker adaptation using evolutionary-based linear transform , 2006, INTERSPEECH.

[18]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[19]  Mark J. F. Gales,et al.  MMI-MAP and MPE-MAP for acoustic model adaptation , 2003, INTERSPEECH.

[20]  Tanja Schultz,et al.  Comparison of acoustic model adaptation techniques on non-native speech , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[21]  Chafic Mokbel,et al.  Online adaptation of HMMs to real-life conditions: a unified framework , 2001, IEEE Trans. Speech Audio Process..