论文信息 - Speech Driven Facial Animation Using Chinese Mandarin Pronunciation Rules

Speech Driven Facial Animation Using Chinese Mandarin Pronunciation Rules

This paper presents an integrated system aimed at synthesizing the facial animation from speech information. A network IFNET composed of context-dependent HMMs(Hidden Markov Model) representing Chinese sub-syllables is employed to obtain the corresponding Chinese initial and final sequence within the input speech. Instead of being based on some finite audio-visual database, IFNET is just built according to the Chinese mandarin pronunciation rules. Considering the large amount of computation, we embed Forward-Backward Search Algorithm in the course of searching in IFNET. After the initial and final sequence constructed, they are converted to the MPEG-4 high-level facial animation parameters to drive a 3D head model performing corresponding facial expressions. Experiment results show that our system works well in simulating the real mouth shapes, giving the speech information in many different situations speaking Chinese.

Chun Chen | Jiajun Bu | Mingli Song | Mingyu You

[1] Christoph Bregler,et al. Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[2] Herbert Reininger,et al. A fully recurrent neural network for recognition of noisy telephone speech , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[3] F. Lavagetto,et al. Converting speech into lip movements: a multimedia telephone for hard of hearing people , 1995 .

[4] Daniel Thalmann,et al. Models and Techniques in Computer Animation , 2014, Computer Animation Series.

[5] Michael M. Cohen,et al. Modeling Coarticulation in Synthetic Visual Speech , 1993 .

[6] Nadia Magnenat-Thalmann,et al. Visyllable Based Speech Animation , 2003, Comput. Graph. Forum.

[7] Zuo Li. Data Mining and Speech Driven Face Animation , 2002 .

[8] Satoshi Nakamura,et al. Lip movement synthesis from speech based on hidden Markov models , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[9] Matthew Brand,et al. Voice puppetry , 1999, SIGGRAPH.

[10] Kazuya Takeda,et al. Acoustic analysis and recognition of whispered speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11] Steve Austin,et al. The forward-backward search algorithm , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[12] Thoms M. Levergood,et al. DECface: A system for synthetic face applications , 1995, Multimedia Tools and Applications.

[13] Jörn Ostermann,et al. User evaluation: Synthetic talking faces for interactive services , 1999, The Visual Computer.

[14] Tzong-Jer Yang,et al. A speech driven talking head system based on a single face image , 1999, Proceedings. Seventh Pacific Conference on Computer Graphics and Applications (Cat. No.PR00293).

[15] Tsuhan Chen,et al. Using HMMs in audio-to-visual conversion , 1997, Proceedings of First Signal Processing Society Workshop on Multimedia Signal Processing.

[16] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[17] D. Massaro. Speech Perception By Ear and Eye: A Paradigm for Psychological Inquiry , 1989 .

[18] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[19] S. Boll,et al. Suppression of acoustic noise in speech using spectral subtraction , 1979 .