Real time speech driven facial animation using formant analysis

Formant analysis is a technique widely used for speech analysis and synthesis. In this paper, we present a simple, fast and effective method for real time speech driven facial animation based on formant analysis. Speech signal is first processed by a formant analyzer. Since the resulting formants are known to be correlated with vocal tract shape, the formants can be directly mapped to mouth shapes. In addition, median filter and energy modulation is used to smooth the mouth shape sequence. The smoothed mouth shape sequence is used to animate our synthetic 3D head model with synchronized audio. The proposed method is simple and does not rely on contextual information. Thus it is good for real time two-way communication applications. Since the method extracts mouth shapes from acoustic features, it is language independent. In speaker-independent case, the proposed method is also shown to work well.

[1]  Shigeo Morishima,et al.  Face-to-face communicative avatar driven by voice , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[2]  Matthew Brand,et al.  Voice puppetry , 1999, SIGGRAPH.

[3]  Keiichi Tokuda,et al.  Visual Speech Synthesis Based on Parameter Generation From HMM: Speech-Driven and Text-And-Speech-Driven Approaches , 1998, AVSP.

[4]  Jörn Ostermann,et al.  User evaluation: Synthetic talking faces for interactive services , 1999, The Visual Computer.

[5]  Thoms M. Levergood,et al.  DEC face: an automatic lip-synchronization algorithm for synthetic faces , 1993 .

[6]  Nadia Magnenat-Thalmann,et al.  Lip synchronization using linear predictive analysis , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[7]  Hiroshi Harashima,et al.  A Media Conversion from Speech to Facial Image for Intelligent Man-Machine Interface , 1991, IEEE J. Sel. Areas Commun..

[8]  Ronald W. Schafer,et al.  Digital Processing of Speech Signals , 1978 .

[9]  David F. McAllister,et al.  Speaker independence in automated lip-sync for audio-video communication , 1998, Comput. Networks.

[10]  Fabio Lavagetto,et al.  LIP movements synthesis using time delay neural networks , 1996, 1996 8th European Signal Processing Conference (EUSIPCO 1996).