Making a thinking-talking head

This paper describes the Thinking-Talking Head; an interdisciplinary project that sits between and draws upon engineering/computer science and behavioural/cognitive science; research and performance; implementation and evaluation. The project involves collaboration between computer scientists, engineers, language technologists and cognitive scientists, and its aim is twofold (a) to create a 3-D computer animation of a human head that will interact in real time with human agents, and (b) to serve as a research platform to drive research in the contributing disciplines, and in talking head research in general. The thinkingtalking head will emulate elements of face-to-face conversation through speech (including intonation), gaze and gesture. So it must have an active sensorium that accurately reflects the properties of its immediate environment, and must be able to generate appropriate communicative signals to feedback to the interlocutor. Here we describe the current implementation and outline how we are tackling issues concerning both the outputs (synthetic voice, visual speech, facial expressiveness and naturalness) from and inputs (auditory-visual speech recognition, emotion recognition, auditory-visual speaker localization) to the head. We describe how these head functions will be tuned and evaluated using various paradigms, including an imitation paradigm.

[1]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[2]  Norman I. Badler,et al.  Animating facial expressions , 1981, SIGGRAPH '81.

[3]  Rodney A. Brooks,et al.  From earwigs to humans , 1997, Robotics Auton. Syst..

[4]  Frederick I. Parke,et al.  Computer generated animation of faces , 1972, ACM Annual Conference.

[5]  Ted Selker,et al.  Enhancing interface design using attentive interaction design toolkit , 2006, SIGGRAPH '06.

[6]  Takaaki Kuratate Estimation of 3D face expressions from front and side view photographs using a 3D face database , 2004 .

[7]  Demetri Terzopoulos,et al.  Physically-based facial modelling, analysis, and animation , 1990, Comput. Animat. Virtual Worlds.

[8]  B. J. Fogg,et al.  Can computer personalities be human personalities? , 1995, Int. J. Hum. Comput. Stud..

[9]  Frederic I. Parke,et al.  Control Parameterization for Facial Animation , 1991 .

[10]  Atsuo Takanishi,et al.  Development and evaluation of face robot to express various face shape , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[11]  Michael M. Cohen,et al.  Modeling Coarticulation in Synthetic Visual Speech , 1993 .

[12]  N.M. Kwok,et al.  Sound source localization: microphone array design and evolutionary estimation , 2005, 2005 IEEE International Conference on Industrial Technology.

[13]  B. J. Fogg,et al.  Can computer personalities be human personalities? , 1995, Int. J. Hum. Comput. Stud..

[14]  Frederic I. Parke,et al.  A parametric model for human faces. , 1974 .

[15]  Demetri Terzopoulos,et al.  Heads up!: biomechanical modeling and neuromuscular control of the neck , 2006, ACM Trans. Graph..

[16]  E. Vatikiotis-Bateson,et al.  Developing Physically-Based , Dynamic Vocal Tract Models using ArtiSynth , 2006 .

[17]  Roland Göcke 3d Lip Tracking and Co-inertia Analysis for Improved Robustness of Audio-video Automatic Speech Recognition , 2005, AVSP.

[18]  Keith Waters,et al.  A muscle model for animation three-dimensional facial expression , 1987, SIGGRAPH.