Video-realistic image-based eye animation via statistically driven state machines

In this work we elaborate on a novel image-based system for creating video-realistic eye animations to arbitrary spoken output. These animations are useful to give a face to multimedia applications such as virtual operators in dialog systems. Our eye animation system consists of two parts: eye control unit and rendering engine, which synthesizes eye animations by combining 3D and image-based models. The designed eye control unit is based on eye movement physiology and the statistical analysis of recorded human subjects. As already analyzed in previous publications, eye movements vary while listening and talking. We focus on the latter and are the first to design a new model which fully automatically couples eye blinks and movements with phonetic and prosodic information extracted from spoken language. We extended the already known simple gaze model by refining mutual gaze to better model human eye movements. Furthermore, we improved the eye movement models by considering head tilts, torsion, and eyelid movements. Mainly due to our integrated blink and gaze model and to the control of eye movements based on spoken language, subjective tests indicate that participants are not able to distinguish between real eye motions and our animations, which has not been achieved before.

[1]  J. Stahl,et al.  Amplitude of human head movements associated with horizontal saccades , 1999, Experimental Brain Research.

[2]  Norman I. Badler,et al.  Eyes alive , 2002, ACM Trans. Graph..

[3]  S. Drucker,et al.  The Role of Eye Gaze in Avatar Mediated Conversational Interfaces , 2000 .

[4]  Mark Steedman,et al.  Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents , 1994, SIGGRAPH.

[5]  T. Haslwanter Mathematics of three-dimensional eye rotations , 1995, Vision Research.

[6]  J. Terken,et al.  Fundamental frequency and perceived prominence of accented syllables. II. Nonfinal accents. , 1994, The Journal of the Acoustical Society of America.

[7]  Zhigang Deng,et al.  Natural Eye Motion Synthesis by Modeling Gaze-Head Coupling , 2009, 2009 IEEE Virtual Reality Conference.

[8]  E. Cosatto Sample-based talking-head synthesis , 2002 .

[9]  John P. Lewis,et al.  Automated eye motion using texture synthesis , 2005, IEEE Computer Graphics and Applications.

[10]  J. Terken Fundamental frequency and perceived prominence of accented syllables. , 1991, The Journal of the Acoustical Society of America.

[11]  Catherine Pelachaud,et al.  Eye Communication in a Conversational 3D Synthetic Agent , 2000, AI Commun..

[12]  Barry Arons,et al.  Pitch-based emphasis detection for segmenting speech recordings , 1994, ICSLP.

[13]  F. I. Parke June,et al.  Computer Generated Animation of Faces , 1972 .

[14]  A. Kendon Some functions of gaze-direction in social interaction. , 1967, Acta psychologica.

[15]  Christoph Bregler,et al.  Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[16]  J. Cassell,et al.  Turn taking vs. Discourse Structure: How Best to Model Multimodal Conversation , 1998 .

[17]  Mel Slater,et al.  The impact of eye gaze on communication using humanoid avatars , 2001, CHI.

[18]  Jing Zheng,et al.  Word-level rate of speech modeling using rate-specific phones and pronunciations , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[19]  G. Barnes Vestibulo‐ocular function during co‐ordinated head and eye movements to acquire visual targets. , 1979, The Journal of physiology.

[20]  W. S. Condon,et al.  A segmentation of behavior , 1967 .

[21]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[22]  Hans Peter Graf,et al.  Photo-Realistic Talking-Heads from Image Samples , 2000, IEEE Trans. Multim..

[23]  Edward G. Freedman,et al.  Coordination of the eyes and head: movement kinematics , 2000, Experimental Brain Research.

[24]  A. Kolmogoroff Confidence Limits for an Unknown Distribution Function , 1941 .

[25]  Jörn Ostermann,et al.  Talking faces - technologies and applications , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[26]  Steve Young,et al.  The HTK book , 1995 .

[27]  Sugato Chakravarty,et al.  Methodology for the subjective assessment of the quality of television pictures , 1995 .

[28]  Dirk Heylen,et al.  Experimenting with the Gaze of a Conversational Agent , 2002 .

[29]  M. Argyle,et al.  Gaze and Mutual Gaze , 1994, British Journal of Psychiatry.

[30]  Daniel P. W. Ellis,et al.  Pitch-based emphasis detection for characterization of meeting recordings , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[31]  Frederick I. Parke,et al.  Computer gernerated animation of faces , 1998 .

[32]  M. von Cranach,et al.  Über einige Bedingungen des Zusammenhanges von Lidschlag und Blickwendung , 1969 .

[33]  Jan Ygge,et al.  Assessment of ocular counterroll during head tilt using binocular video oculography. , 2002, Investigative ophthalmology & visual science.

[34]  Tomaso Poggio,et al.  Trainable Videorealistic Speech Animation , 2004, FGR.

[35]  J. Aitchison,et al.  The Lognormal Distribution. , 1958 .

[36]  J. Kuipers Quaternions and Rotation Sequences , 1998 .

[37]  W. Stahel,et al.  Log-normal Distributions across the Sciences: Keys and Clues , 2001 .

[38]  Junichi Hoshino,et al.  Generating head–eye movement for virtual actor , 2006 .

[39]  Norihiro Hagita,et al.  Messages embedded in gaze of interface agents --- impression management with agent's gaze , 2002, CHI.

[40]  Yorick Wilks Machine Conversations , 1999 .