Accent Issues in Large Vocabulary Continuous Speech Recognition

This paper addresses accent1 issues in large vocabulary continuous speech recognition. Cross-accent experiments show that the accent problem is very dominant in speech recognition. Analysis based on multivariate statistical tools (principal component analysis and independent component analysis) confirms that accent is one of the key factors in speaker variability. Considering different applications, we proposed two methods for accent adaptation. When a certain amount of adaptation data was available, pronunciation dictionary modeling was adopted to reduce recognition errors caused by pronunciation mistakes. When a large corpus was collected for each accent type, accent-dependent models were trained and a Gaussian mixture model-based accent identification system was developed for model selection. We report experimental results for the two schemes and verify their efficiency in each situation.

[1]  Bo Xu,et al.  Mandarin accent adaptation based on context-independent/context-dependent pronunciation modeling , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[2]  Mark J. F. Gales Cluster adaptive training of hidden Markov models , 2000, IEEE Trans. Speech Audio Process..

[3]  Andrej Ljolje,et al.  Automatic Generation of Detailed Pronunciation Lexicons , 1996 .

[4]  Chao Huang,et al.  Automatic accent identification using Gaussian mixture models , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[5]  Mei-Yuh Hwang,et al.  Microsoft Windows highly intelligent speech recognizer: Whisper , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[6]  John H. L. Hansen,et al.  Foreign accent classification using source generator based prosodic features , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  Tao Chen,et al.  Speaker selection training for large vocabulary continuous speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Pascale Fung,et al.  Fast accent identification and accented speech recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[10]  Helmer Strik,et al.  Modeling pronunciation variation for ASR: overview and comparison of methods , 1998 .

[11]  Roland Kuhn,et al.  Rapid speaker adaptation in eigenvoice space , 2000, IEEE Trans. Speech Audio Process..

[12]  Kuldip K. Paliwal,et al.  Automatic Speech and Speaker Recognition: Advanced Topics , 1999 .

[13]  Philip C. Woodland,et al.  The use of accent-specific pronunciation dictionaries in acoustic model training , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[14]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[15]  Tao Chen,et al.  Analysis of Speaker Variability , 2022 .

[16]  William J. Byrne,et al.  Stochastic pronunciation modelling from hand-labelled phonetic corpora , 1999, Speech Commun..

[17]  Hynek Hermansky,et al.  Towards decomposing the sources of variability in speech , 1997, EUROSPEECH.

[18]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[19]  Chao Huang,et al.  Accent modeling based on pronunciation dictionary adaptation for large vocabulary Mandarin speech recognition , 2000, INTERSPEECH.

[20]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[21]  Tao Chen,et al.  On the use of Gaussian mixture model for speaker variability analysis , 2002, INTERSPEECH.

[22]  Biing-Hwang Juang,et al.  A study on speaker adaptation of the parameters of continuous density hidden Markov models , 1991, IEEE Trans. Signal Process..

[23]  Marc A. Zissman,et al.  Improving accent identification through knowledge of English syllable structure , 1998, ICSLP.

[24]  Isabel Trancoso,et al.  Accent identification , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[25]  Zhihong Hu Understanding and adapting to speaker variability using correlation-based principal component analysis , 1999 .

[26]  Chao Huang,et al.  Large vocabulary Mandarin speech recognition with different approaches in modeling tones , 2000, INTERSPEECH.