论文信息 - Development of text-to-audiovisual speech synthesis to support interactive language learning on a mobile device

Development of text-to-audiovisual speech synthesis to support interactive language learning on a mobile device

We have developed distributed text-to-audiovisual-speech synthesizer (TTAVS) to support interactivity in computer-aided pronunciation training (CAPT) on a mobile platform. The TTAVS serves to generate audiovisual corrective feedback based on detected mispronunciations from the second language learner's speech. Our approach encodes key visemes in SVG format that are compressed by GZIP and transmitted to the client, where the browser can perform real-time morphing to render the visual speech. We have also developed a TTAVS animation player that can play the audio and visual speech synchronously while enabling user controls in play/pause/resume. Evaluation shows that this newly proposed approach, vis-à-vis our original approach that involves generation of an Ogg video on the server-side which is streamed to the client, achieves a significant reduction (66%) in average size of the output files that are transmitted from the server to the client, reduction of (83%) in client waiting times, as well as preserve the quality of the image.

Helen Meng | Ka-Ho Wong | Ka-Wa Yuen | Wai-Kim Leung

[1] Wai Kit Lo,et al. Allophonic variations in visual speech synthesis for corrective feedback in CAPT , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2] Braj B. Kachru. Asian Englishes Beyond the Canon , 2005 .

[3] Wai Kit Lo,et al. Development of an articulatory visual-speech synthesizer to support language learning , 2010, 2010 7th International Symposium on Chinese Spoken Language Processing.

[4] G. Sallai,et al. The Cradle of the Cognitive Infocommunications , 2012 .

[5] Alissa M. Harrison,et al. Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English : The CUHK Experience Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English : The CUHK Experience , 2010 .

[6] Helen Meng,et al. Enunciate: An internet-accessible computer-aided pronunciation training system and related user evaluations , 2011, 2011 International Conference on Speech Database and Assessments (Oriental COCOSDA).

[7] Andrew Faulkner,et al. Effect of audiovisual perceptual training on the perception and production of consonants by Japanese learners of English , 2005, Speech Commun..