论文信息 - Development of an articulatory visual-speech synthesizer to support language learning

Development of an articulatory visual-speech synthesizer to support language learning

This paper presents a two-dimensional (2D) visual-speech synthesizer to support language learning. A visual-speech synthesizer animates the human articulators in synchronization with speech signals, e.g., output from a text-to-speech synthesizer. A visual-speech animation can offer a concrete illustration to the language learners on how to move and where to place the articulators when pronouncing a phoneme. We adopt a 2D vector-based viseme models and compiled a collection of visemes to cover the articulation of all English phonemes (42 visemes for the 44 English phonemes). Morphing between properly selected vector-based articulation images achieves articulatory animations. In this way, we have developed an articulatory visual speech synthesizer that can accept free-text input and synthesize articulatory dynamics in real-time. Evaluation involving 32 subjects based on “lip-reading” shows that they can identify the appropriate word(s) based on articulation animation alone nearly ∼80% of the time

Wai Kit Lo | Helen M. Meng | Ka-Ho Wong | Wai-Kim Leung

[1] Braj B. Kachru. Asian Englishes Beyond the Canon , 2005 .

[2] D. Nilsen,et al. Pronunciation contrasts in English , 1972 .

[3] Yuen Yee Lo,et al. Deriving salient learners’ mispronunciations from cross-language phonological comparisons , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[4] P. Ladefoged. A course in phonetics , 1975 .

[5] Shondel Nero. B. Kachru: Asian Englishes: Beyond the Canon , 2009 .

[6] Anna Hjalmarsson,et al. Embodied conversational agents in computer assisted language learning , 2009, Speech Commun..

[7] Tien-Tsin Wong,et al. A real-time Cantonese text-to-audiovisual speech synthesizer , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8] Lan Wang,et al. Improving mispronunciation detection and diagnosis of learners' speech with context-sensitive phonological rules based on language transfer , 2008, INTERSPEECH.