Generating Animated Pronunciation from Speech Through Articulatory Feature Extraction

We automatically generate CG animations to express the pronunciation movement of speech through articulatory feature (AF) extraction to help learn a pronunciation. The proposed system uses MRI data to map AFs to coordinate values that are needed to generate the animations. By using magnetic resonance imaging (MRI) data, we can observe the movements of the tongue, palate, and pharynx in detail while a person utters words. AFs and coordinate values are extracted by multi-layer neural networks (MLN). Specifically, the system displays animations of the pronunciation movements of both the learner and teacher from their speech in order to show in what way the learner’s pronunciation is wrong. Learners can thus understand their wrong pronunciation and the correct pronunciation method through specific animated pronunciations. Experiments to compare MRI data with the generated animations confirmed the accuracy of articulatory features. Additionally, we verified the effectiveness of using AF to generate animation.