Real-time conversion from a single 2D face image to a 3D text-driven emotive audio-visual avatar

In this paper, we propose a complete pipeline of efficient and low-cost techniques to construct a realistic 3D text-driven emotive audio-visual avatar from a single 2D frontal-view face image of any person on the fly. This real-time conversion is achieved through three steps. First, a personalized 3D face model is built based on the 2D face image using a fully automatic 3D face shape and texture reconstruction framework. Second, using standard MPEG-4 FAPs (Facial Animation Parameters), the face model is animated by the Viseme and expression channels and is complemented by the visual prosody channel that controls head, eye and eyelid movements. Finally, the facial animation is combined and synchronized with the emotive synthetic speech generated by incorporating an emotion transformer into a Festival-MBROLA text to neutral speech synthesizer.

[1]  T. Dutoit An introduction to text-to-speech synthesis , 1997 .

[2]  Ping-Sing Tsai,et al.  Shape from Shading: A Survey , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Zicheng Liu,et al.  Model-based bundle adjustment with application to face modeling , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[4]  Algirdas Pakstas,et al.  MPEG-4 Facial Animation: The Standard,Implementation and Applications , 2002 .

[5]  Sami Romdhani,et al.  Face Identification by Fitting a 3D Morphable Model Using Linear Shape and Texture Error Functions , 2002, ECCV.

[6]  Igor S. Pandzic,et al.  MPEG-4 Facial Animation , 2002 .

[7]  Harry Shum,et al.  Face alignment using texture-constrained active shape models , 2003, Image Vis. Comput..

[8]  Mingjing Li,et al.  Robust multipose face detection in images , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Yuxiao Hu,et al.  Automatic 3D reconstruction for face recognition , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[10]  Yun Fu,et al.  EAVA: A 3D Emotive Audio-Visual Avatar , 2008, 2008 IEEE Workshop on Applications of Computer Vision.