LIPSYNC.AI: A.I. Driven Lips and Tongue Animations Using Articulatory Phonetic Descriptors and FACS Blendshapes

We present a solution for generating realistic lips and tongue animations, using a novel hybrid method which makes use of both the advancements in deep learning and the theory behind speech and phonetics. Our solution generates highly accurate and natural animations of the jaw, lips and tongue through the use of additional phonetic information during the neural network training, and the procedural mapping of its outputs directly to FACS [Prince et al. 2015] based blendshapes, in order to comply to animation industry standards.

[1]  Xutao Li,et al.  A Deep Learning Approach to Nightfire Detection based on Low-Light Satellite , 2021, Computer Science & Information Technology (CS & IT).

[2]  Yisong Yue,et al.  A deep learning approach for generalized speech animation , 2017, ACM Trans. Graph..

[3]  Visemenet , 2018, ACM Transactions on Graphics.

[4]  A. Frank van der Stappen,et al.  Audio‐driven emotional speech animation for interactive virtual characters , 2019, Comput. Animat. Virtual Worlds.

[5]  Eugene Fiume,et al.  JALI-Driven Expressive Facial Animation and Multilingual Speech in Cyberpunk 2077 , 2020, SIGGRAPH Talks.

[6]  Katherine B. Martin,et al.  Facial Action Coding System , 2015 .

[7]  Subhransu Maji,et al.  Visemenet , 2018, ACM Trans. Graph..

[8]  Ausdang Thangthai,et al.  Synthesising visual speech using dynamic visemes and deep learning architectures , 2019, Comput. Speech Lang..

[9]  Michael J. Black,et al.  Capture, Learning, and Synthesis of 3D Speaking Styles , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Jaakko Lehtinen,et al.  Audio-driven facial animation by joint end-to-end learning of pose and emotion , 2017, ACM Trans. Graph..