论文信息 - Speech Synthesis for Error Training Models in CALL

Speech Synthesis for Error Training Models in CALL

A computer assisted pronunciation teaching system (CAPT) is a fundamental component in a computer assisted language learning system (CALL). A speech recognition based CAPT system often requires a large amount of speech data to train the incorrect phone models in its speech recognizer. But collecting incorrectly pronounced speech data is a labor intensive and costly work. This paper reports an effort on training the incorrect phone models by making use of synthesized speech data. A special formant speech synthesizer is designed to filter the correctly pronounced phones into incorrect phones by modifying the formant frequencies. In a Chinese Putonghua CALL system for native Cantonese speakers to learn Mandarin, a small experimental CAPT system is built with a synthetic speech data trained recognizer. Evaluation shows that a CAPT system using synthesized data can perform as good as or even better than that using real data provided that the size of the synthetic data are large enough.

[1] Paul Dalsgaard,et al. On the use of data-driven clustering technique for identification of poly- and mono-phonemes for four European languages , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[2] Mitch Weintraub,et al. Automatic evaluation and training in English pronunciation , 1990, ICSLP.

[3] S. McCandless,et al. An algorithm for automatic formant extraction using linear prediction spectra , 1974 .

[4] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[5] Alan Bailin,et al. Intelligent computer-assisted language learning: A bibliography , 1995, Comput. Humanit..

[6] Qin Lu,et al. A Computer Assisted Language Learning System based on Error Trends Grouping , 2007, 2007 International Conference on Natural Language Processing and Knowledge Engineering.

[7] Michael Harrington. Intelligent Computer-Assisted Language Learning , 1996 .

[8] Dennis H. Klatt,et al. Software for a cascade/parallel formant synthesizer , 1980 .

[9] Horacio Franco,et al. Automatic detection of mispronunciation for language instruction , 1997, EUROSPEECH.