Expressive Speech for a Virtual Talking Head

This paper presents our work on building an expressive facial speech synthesis system Eface, which can be used on a social or service robot. Eface aims at enabling a robot to deliver information clearly with empathetic speech and an expressive virtual face. The system is built on two open source software packages: the Festival speech synthesis system, which provides robots the capability to speak with dierent voices and emotions, and Xface{a 3D talking head, which enables the robot to display various human facial expressions. This paper addresses how to express dierent speech emotions with Festival and how to integrate the synthesized speech with Xface. We have also implemented Eface on a physical robot and tested it with some service scenarios.

[1]  Cynthia Breazeal,et al.  Designing sociable robots , 2002 .

[2]  Christoph Bartneck,et al.  Interacting with an embodied emotional character , 2003, DPPI '03.

[3]  Illah R. Nourbakhsh,et al.  An Affective Mobile Robot Educator with a Full-Time Job , 1999, Artif. Intell..

[4]  Fernando Pereira,et al.  MPEG-4 facial animation technology: survey, implementation, and results , 1999, IEEE Trans. Circuits Syst. Video Technol..

[5]  Richard T. Vaughan,et al.  The Player/Stage Project: Tools for Multi-Robot and Distributed Sensor Systems , 2003 .

[6]  Koray Balci Xface: MPEG-4 based open source toolkit for 3D Facial Animation , 2004, AVI.

[7]  Prem Kalra,et al.  Face to virtual face , 1998, Proc. IEEE.

[8]  Klaus R. Scherer,et al.  Vocal communication of emotion: A review of research paradigms , 2003, Speech Commun..

[9]  Brian Scassellati,et al.  How to build robots that make friends and influence people , 1999, Proceedings 1999 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human and Environment Friendly Robots with High Intelligence and Emotional Quotients (Cat. No.99CH36289).

[10]  Wolfram Burgard,et al.  MINERVA: a second-generation museum tour-guide robot , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[11]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[12]  Bruce A. MacDonald,et al.  Expressive facial speech synthesis on a robotic platform , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  D. Fox,et al.  Towards Personal Service Robots for the Elderly , 1999 .

[14]  Alan W. Black,et al.  Generating F/sub 0/ contours from ToBI labels using linear regression , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[15]  F. Hara,et al.  Use of face robot for human-computer communication , 1995, 1995 IEEE International Conference on Systems, Man and Cybernetics. Intelligent Systems for the 21st Century.

[16]  Bruce MacDonald,et al.  Towards Expressive Speech Synthesis in English on a Robotic Platform , 2006 .

[17]  Franck Davoine,et al.  Expressive face recognition and synthesis , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[18]  Ahmet M. Kondoz,et al.  Automatic Single View-Based 3-D Face Synthesis for Unsupervised Multimedia Applications , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[19]  Joelle Pineau,et al.  Towards robotic assistants in nursing homes: Challenges and results , 2003, Robotics Auton. Syst..

[20]  Kim E. A. Silverman,et al.  Vocal cues to speaker affect: testing two models , 1984 .

[21]  Paul Taylor,et al.  The architecture of the Festival speech synthesis system , 1998, SSW.

[22]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.

[23]  T. Michael Knasel,et al.  Robotics and autonomous systems , 1988, Robotics Auton. Syst..