Speech Driven Facial Animation Using Chinese Mandarin Pronunciation Rules

This paper presents an integrated system aimed at synthesizing the facial animation from speech information. A network IFNET composed of context-dependent HMMs(Hidden Markov Model) representing Chinese sub-syllables is employed to obtain the corresponding Chinese initial and final sequence within the input speech. Instead of being based on some finite audio-visual database, IFNET is just built according to the Chinese mandarin pronunciation rules. Considering the large amount of computation, we embed Forward-Backward Search Algorithm in the course of searching in IFNET. After the initial and final sequence constructed, they are converted to the MPEG-4 high-level facial animation parameters to drive a 3D head model performing corresponding facial expressions. Experiment results show that our system works well in simulating the real mouth shapes, giving the speech information in many different situations speaking Chinese.

[1]  Christoph Bregler,et al.  Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[2]  Herbert Reininger,et al.  A fully recurrent neural network for recognition of noisy telephone speech , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[3]  F. Lavagetto,et al.  Converting speech into lip movements: a multimedia telephone for hard of hearing people , 1995 .

[4]  Daniel Thalmann,et al.  Models and Techniques in Computer Animation , 2014, Computer Animation Series.

[5]  Michael M. Cohen,et al.  Modeling Coarticulation in Synthetic Visual Speech , 1993 .

[6]  Nadia Magnenat-Thalmann,et al.  Visyllable Based Speech Animation , 2003, Comput. Graph. Forum.

[7]  Zuo Li Data Mining and Speech Driven Face Animation , 2002 .

[8]  Satoshi Nakamura,et al.  Lip movement synthesis from speech based on hidden Markov models , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[9]  Matthew Brand,et al.  Voice puppetry , 1999, SIGGRAPH.

[10]  Kazuya Takeda,et al.  Acoustic analysis and recognition of whispered speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Steve Austin,et al.  The forward-backward search algorithm , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[12]  Thoms M. Levergood,et al.  DECface: A system for synthetic face applications , 1995, Multimedia Tools and Applications.

[13]  Jörn Ostermann,et al.  User evaluation: Synthetic talking faces for interactive services , 1999, The Visual Computer.

[14]  Tzong-Jer Yang,et al.  A speech driven talking head system based on a single face image , 1999, Proceedings. Seventh Pacific Conference on Computer Graphics and Applications (Cat. No.PR00293).

[15]  Tsuhan Chen,et al.  Using HMMs in audio-to-visual conversion , 1997, Proceedings of First Signal Processing Society Workshop on Multimedia Signal Processing.

[16]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[17]  D. Massaro Speech Perception By Ear and Eye: A Paradigm for Psychological Inquiry , 1989 .

[18]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[19]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .