Speech-driven lip motion generation with a trajectory HMM

Automatic speech animation remains a challenging problem that can be described as finding the optimal sequence of animation parameter configurations given some speech. In this paper we present a novel technique to automatically synthesise lip motion trajectories from a speech signal. The developed system predicts lip motion units from the speech signal and generates animation trajectories automatically employing a ”Trajectory Hidden Markov Model”. Using the MLE criterion, its parameter generation algorithm produces the optimal smooth motion trajectories that are used to drive control points on the lips directly. Additionally, experiments were carried out to find a suitable model unit that produces the most accurate results. Finally a perceptual evaluation was conducted, that showed that the developed motion units perform better than phonemes.

[1]  Matthew Brand,et al.  Voice puppetry , 1999, SIGGRAPH.

[2]  Keiichi Tokuda,et al.  Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[3]  Thomas S. Huang,et al.  Real time speech driven facial animation using formant analysis , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[4]  Volker Strom,et al.  Visual prosody: facial movements accompanying speech , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[5]  Yung-Chang Chen,et al.  Partial Linear Regression for Audio-Driven Talking Head Application , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[6]  Michael M. Cohen,et al.  Modeling Coarticulation in Synthetic Visual Speech , 1993 .

[7]  Gregor Hofer,et al.  Automatic head motion prediction from speech data , 2007, INTERSPEECH.

[8]  Satoshi Nakamura,et al.  Statistical multimodal integration for audio-visual speech processing , 2002, IEEE Trans. Neural Networks.

[9]  Heiga Zen,et al.  Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences , 2007, Comput. Speech Lang..