Realistic Lip Syncing for Virtual Character Using Common Viseme Set

Speech is one of the most important interaction methods between the humans. Therefore, most of avatar researches focus on this area with significant attention. Creating animated speech requires a facial model capable of representing the myriad shapes the human face expressions during speech. Moreover, a method to produce the correct shape at the correct time is also in order. One of the main challenges is to create precise lip movements of the avatar and synchronize it with a recorded audio. This paper proposes a new lip synchronization algorithm for realistic applications, which can be employed to generate synchronized facial movements among the audio generated from natural speech or through a text-to-speech engine. This method requires an animator to construct animations using a canonical set of visemes for all pair wise combination of a reduced phoneme set. These animations are then stitched together smoothly to construct the final animation.

[1]  Hui Chen,et al.  Phoneme-level articulatory animation in pronunciation training , 2012, Speech Commun..

[2]  Verónica Orvalho,et al.  A Proposal for a Visual Speech Animation System for European Portuguese , 2012, IberSPEECH.

[3]  Frédéric H. Pighin,et al.  Expressive speech-driven facial animation , 2005, TOGS.

[4]  Chai Wutiwiwatchai,et al.  Lip synchronization from Thai speech , 2011, VRCAI '11.

[5]  Parke,et al.  Parameterized Models for Facial Animation , 1982, IEEE Computer Graphics and Applications.

[6]  Yuyu Xu,et al.  A Practical and Configurable Lip Sync Method for Games , 2013, MIG.

[7]  Satoshi Nakamura,et al.  Model-based talking face synthesis for anthropomorphic spoken dialog agent system , 2003, MULTIMEDIA '03.

[8]  Farzad Towhidkhah,et al.  Audio-visual speaker identification using dynamic facial movements and utterance phonetic content , 2011, Appl. Soft Comput..

[9]  Nadia Magnenat-Thalmann,et al.  A Model for Personality and Emotion Simulation , 2003, KES.

[10]  Jhing-Fa Wang,et al.  Kernel-Based Lip Shape Clustering with Phoneme Recognition for Real-Time Voice Driven Talking Face , 2010, ISNN.

[11]  Guillaume Gibert,et al.  Evaluating a synthetic talking head using a dual task: Modality effects on speech understanding and cognitive load , 2013, Int. J. Hum. Comput. Stud..

[12]  Ken-ichi Anjyo,et al.  Developing tools for 2D/3D conversion of Japanese animations , 2011, SIGGRAPH '11.

[13]  Michael M. Cohen,et al.  Modeling Coarticulation in Synthetic Visual Speech , 1993 .

[14]  Christoph Bregler,et al.  Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[15]  Zhigang Deng,et al.  A Statistical Quality Model for Data-Driven Speech Animation , 2012, IEEE Transactions on Visualization and Computer Graphics.

[16]  Koray Balci Xface: MPEG-4 based open source toolkit for 3D Facial Animation , 2004, AVI.

[17]  Abdul Hanan Abdullah,et al.  Adaptation of wireless sensor network in industries and their architecture, standards and applications , 2014 .

[18]  Seongah Chin,et al.  Multi‐layer structural wound synthesis on 3D face , 2011, Comput. Animat. Virtual Worlds.

[19]  Moshe Mahler,et al.  Dynamic units of visual speech , 2012, SCA '12.

[20]  Luc Van Gool,et al.  Speech Animation Using Viseme Space , 2002, VMV.

[21]  Hassan Ugail,et al.  On the Development of a Talking Head System Based on the Use of PDE-Based Parametic Surfaces , 2011, Trans. Comput. Sci..

[22]  Ya Li,et al.  A multimodal approach of generating 3D human-like talking agent , 2011, Journal on Multimodal User Interfaces.

[23]  Abdul Hanan Abdullah,et al.  A Survey on Intelligent Transportation Systems , 2013 .