Accent-independent universal HMM-based speech recognizer for american, australian and british English

This paper addresses the problem of speech recognition under accent variations in English language. It has been demonstrated in previous research efforts that the multi-transitional model architecture is one of the solutions for robust speech recognition. In this study, we describe an universal hybrid system that is trained with data from American, Australian, and British accented speech. Experimental results on connecteddigit recognition task show an average string error rate reduction of about 62% and 8% when compared to our best monolingual and multi-transitional systems respectively. The result indicates that the universal model is about three times faster and half time smaller than the multi-transitional or multilingual models and this makes it an ideal choice for practical accentindependent speech recognition applications.

[1]  Eric Sanders,et al.  Modelling phonetic context using head-body-tail models for connected digit recognition , 2000, INTERSPEECH.

[2]  David L. Thomson,et al.  Use of periodicity and jitter as speech recognition features , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  Chin-Hui Lee,et al.  Verifying and correcting recognition string hypotheses using discriminative utterance verification , 1997, Speech Commun..

[4]  Yuqing Gao,et al.  Speaker-independent upfront dialect adaptation in a large vocabulary continuous speech recognizer , 1998, ICSLP.

[5]  John H. L. Hansen,et al.  Frequency characteristics of foreign accented speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Marc A. Zissman,et al.  Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[7]  Chin-Hui Lee,et al.  Nonlinear compensation for stochastic matching , 1999, IEEE Trans. Speech Audio Process..

[8]  Saeed Vaseghi,et al.  Voice conversion between UK and US accented English , 1999, EUROSPEECH.

[9]  Shubha Kadambe,et al.  Robust spoken language identification using large vocabulary speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Régine André-Obrecht,et al.  Cellular phone speech recognition: noise compensation vs. robust architectures , 1997, EUROSPEECH.

[11]  Rathinavelu Chengalvarayan,et al.  A comparative study of hybrid modelling techniques for improved telephone speech recognition , 1998, ICSLP.

[12]  S. Haykin,et al.  Pattern Recognition Using a Family of Design Algorithms Based upon the Generalized Probabilistic Descent Method , 2001 .

[13]  Andreas Stolcke,et al.  A study of multilingual speech recognition , 1997, EUROSPEECH.

[14]  Rathinavelu Chengalvarayan,et al.  Use of multiple classifiers for speech recognition in wireless CDMA network environments , 2000, INTERSPEECH.