论文信息 - A Video, Text, and Speech-Driven Realistic 3-D Virtual Head for Human–Machine Interface

A Video, Text, and Speech-Driven Realistic 3-D Virtual Head for Human–Machine Interface

A multiple inputs-driven realistic facial animation system based on 3-D virtual head for human-machine interface is proposed. The system can be driven independently by video, text, and speech, thus can interact with humans through diverse interfaces. The combination of parameterized model and muscular model is used to obtain a tradeoff between computational efficiency and high realism of 3-D facial animation. The online appearance model is used to track 3-D facial motion from video in the framework of particle filtering, and multiple measurements, i.e., pixel color value of input image and Gabor wavelet coefficient of illumination ratio image, are infused to reduce the influence of lighting and person dependence for the construction of online appearance model. The tri-phone model is used to reduce the computational consumption of visual co-articulation in speech synchronized viseme synthesis without sacrificing any performance. The objective and subjective experiments show that the system is suitable for human-machine interaction.

Jun Yu | Zengfu Wang

[1] Zicheng Liu,et al. Rapid modeling of animated faces from video , 2001, Comput. Animat. Virtual Worlds.

[2] Rama Chellappa,et al. Visual tracking and recognition using appearance-adaptive models in particle filters , 2004, IEEE Transactions on Image Processing.

[3] Javier R. Movellan,et al. Tracking Motion, Deformation, and Texture Using Conditionally Gaussian Processes , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4] S. L. Ho,et al. Speed estimation of an induction motor drive using an optimized extended Kalman filter , 2002, IEEE Trans. Ind. Electron..

[5] Qiang Wang,et al. Real Time Feature Based 3-D Deformable Face Tracking , 2008, ECCV.

[6] M. Mori. The Buddha in the robot , 1981 .

[7] David J. Fleet,et al. Robust Online Appearance Models for Visual Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[8] Christoph Bregler,et al. Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[9] Jörgen Ahlberg. AN UPDATED PARAMETERISED FACE , 2001 .

[10] Michael J. Black,et al. Recognizing Facial Expressions in Image Sequences Using Local Parameterized Models of Image Motion , 1997, International Journal of Computer Vision.

[11] Eduardo Zalama Casanova,et al. A realistic facial animation suitable for human-robot interfacing , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12] Keith Waters,et al. A muscle model for animation three-dimensional facial expression , 1987, SIGGRAPH.

[13] Thomas S. Huang,et al. Capturing subtle facial motions in 3D face tracking , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[14] Peter Eisert,et al. MPEG‐4 facial animation in video analysis and synthesis , 2003, Int. J. Imaging Syst. Technol..

[15] Nando de Freitas,et al. The Unscented Particle Filter , 2000, NIPS.

[16] Radek Grzeszczuk,et al. A data-driven model for monocular face tracking , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[17] Keith Waters,et al. Computer facial animation , 1996 .

[18] Hugh F. Durrant-Whyte,et al. A new method for the nonlinear transformation of means and covariances in filters and estimators , 2000, IEEE Trans. Autom. Control..

[19] Scott S. Snibbe,et al. Experiences with Sparky, a Social Robot , 2002 .

[20] Harry Shum,et al. Real-Time Bayesian 3-D Pose Tracking , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[21] Fabio Pianesi,et al. Xface open source project and smil-agent scripting language for creating and animating embodied conversational agents , 2007, ACM Multimedia.

[22] L. Darrell Whitley,et al. Adaptive Appearance Model and Condensation Algorithm for Robust Face Tracking , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[23] Larry S. Davis,et al. Detection and analysis of hair , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24] Vincent Lepetit,et al. Stable real-time 3D tracking using online and offline information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25] Takeo Kanade,et al. Pose Robust Face Tracking by Combining Active Appearance Models and Cylinder Head Models , 2007, International Journal of Computer Vision.

[26] Raymond D. Kent,et al. Coarticulation in recent speech production models , 1977 .

[27] N. Gordon,et al. Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[28] Takayuki Kanda,et al. Is The Uncanny Valley An Uncanny Cliff? , 2007, RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication.

[29] Tomaso A. Poggio,et al. Reanimating Faces in Images and Video , 2003, Comput. Graph. Forum.

[30] Rudolph van der Merwe,et al. The square-root unscented Kalman filter for state and parameter-estimation , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[31] Mark Steedman,et al. Generating Facial Expressions for Speech , 1996, Cogn. Sci..

[32] Lola Cañamero,et al. I show you how I like you - can you read it in my face? [robotics] , 2001, IEEE Trans. Syst. Man Cybern. Part A.

[33] Zengfu Wang,et al. A Low-dimensional Illumination Space Representation of Human Faces for Arbitrary Lighting Conditions , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[34] Markus Kampmann. Automatic 3-D face model adaptation for model-based coding of videophone sequences , 2002, IEEE Trans. Circuits Syst. Video Technol..

[35] Eduardo Zalama Casanova,et al. A realistic, virtual head for human-computer interaction , 2010, Interact. Comput..

[36] Nadia Magnenat-Thalmann,et al. Fast head modeling for animation , 2000, Image Vis. Comput..

[37] Gérard G. Medioni,et al. 3D face tracking and expression inference from a 2D sequence using manifold learning , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[38] Demetri Terzopoulos,et al. Realistic modeling for facial animation , 1995, SIGGRAPH.

[39] Ronald A. Cole,et al. Perceptive animated interfaces: first steps toward a new paradigm for human-computer interaction , 2003, Proc. IEEE.

[40] Michael M. Cohen,et al. Modeling Coarticulation in Synthetic Visual Speech , 1993 .

[41] Anders Löfqvist,et al. Speech as Audible Gestures , 1990 .

[42] Demetri Terzopoulos,et al. Modelling and animating faces using scanned data , 1991, Comput. Animat. Virtual Worlds.

[43] Fadi Dornaika,et al. Simultaneous Facial Action Tracking and Expression Recognition in the Presence of Head Motion , 2008, International Journal of Computer Vision.

[44] Tao Jianhua,et al. A Review of Text-to-Visual Speech Synthesis , 2006 .

[45] Lola Cannery,et al. I Show You how I Like You-Can You Read it in My Face , 2001 .

[46] Takeo Kanade,et al. Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[47] Gérard G. Medioni,et al. Integrating Multiple Visual Cues for Robust Real-Time 3D Face Tracking , 2007, AMFG.

[48] Simon Baker,et al. 2D vs. 3D Deformable Face Models: Representational Power, Construction, and Real-Time Fitting , 2007, International Journal of Computer Vision.

[49] P. Grassberger. Pruned-enriched Rosenbluth method: Simulations of θ polymers of chain length up to 1 000 000 , 1997 .

[50] Jacob Strom. Model-Based Head Tracking and Coding , 2002 .

[51] Ying Zheng,et al. Reconstruction of 3D Face from a Single 2D Image for Face Recognition , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[52] Marco La Cascia,et al. Fast, Reliable Head Tracking under Varying Illumination: An Approach Based on Registration of Texture-Mapped 3D Models , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[53] Jörgen Ahlberg. Model-based coding : extraction, coding, and evaluation of face model parameters , 2002 .