Comparison of Phoneme and Viseme Based Acoustic Units for Speech Driven Realistic Lip Animation

Natural looking lip animation, synchronized with incoming speech, is essential for realistic character animation. In this work, we evaluate the performance of phone and viseme based acoustic units, with and without context information, for generating realistic lip synchronization using HMM based recognition systems. We conclude via objective evaluations that utilization of viseme based units with context information outperforms the other methods.

[1]  Tobias Öhman,et al.  An audio-visual speech database and automatic measurements of visual speech , 2007 .

[2]  L. Venkata Subramaniam,et al.  Using viseme based acoustic models for speech driven lip synthesis , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[3]  B. Wyvill,et al.  Animating speech: an automated approach using speech synthesised by rules , 1988, The Visual Computer.

[4]  Farzin Deravi,et al.  A review of speech-based bimodal recognition , 2002, IEEE Trans. Multim..

[5]  Levent M. Arslan,et al.  Codebook based face point trajectory synthesis algorithm using speech input , 1999, Speech Commun..

[6]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[7]  Tony Ezzat,et al.  MikeTalk: a talking facial display based on morphing visemes , 1998, Proceedings Computer Animation '98 (Cat. No.98EX169).

[8]  John P. Lewis,et al.  Automated lip-synch and speech synthesis for character animation , 1987, CHI 1987.

[9]  Stephanie Seneff,et al.  Transcription and Alignment of the TIMIT Database , 1996 .

[10]  Satoshi Nakamura,et al.  Lip movement synthesis from speech based on hidden Markov models , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.