"May I talk to you? : -) " - facial animation from text

We introduce a facial animation system that produces real-time animation sequences including speech synchronization and non-verbal speech-related facial expressions from plain text input. A state-of-the-art text-to-speech synthesis component performs linguistic analysis of the text input and creates a speech signal from phonetic and intonation information. The phonetic transcription is additionally used to drive a speech synchronization method for the physically based facial animation. Further high-level information from the linguistic analysis such as different types of accents or pauses as well as the type of the sentence is used to generate non-verbal speech-related facial expressions such as movement of head, eyes, and eyebrows or voluntary eye blinks. Moreover, emotions are translated into XML markup that triggers emotional facial expressions.

[1]  Gérard Bailly,et al.  Audiovisual Speech Synthesis , 2003, Int. J. Speech Technol..

[2]  David B. Pisoni,et al.  Text-to-speech: the mitalk system , 1987 .

[3]  Mark Steedman,et al.  Generating Facial Expressions for Speech , 1996, Cogn. Sci..

[4]  Matthew Stone,et al.  Making discourse visible: coding and animating conversational facial displays , 2002, Proceedings of Computer Animation 2002 (CA 2002).

[5]  Roxane Bertrand,et al.  About the relationship between eyebrow movements and Fo variations , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Horace Ho-Shing Ip,et al.  Script-based facial gesture and speech animation using a NURBS based face model , 1996, Comput. Graph..

[7]  Paul Taylor,et al.  Festival Speech Synthesis System , 1998 .

[8]  Irene Albrecht,et al.  Automatic Generation of Non-Verbal Facial Expressions from Speech , 2002 .

[9]  Wojciech Skut,et al.  Chunk Tagger - Statistical Recognition of Noun Phrases , 1998, ArXiv.

[10]  Björn Granström,et al.  Timing and interaction of visual cues for prominence in audiovisual speech perception , 2001, INTERSPEECH.

[11]  Matthew Brand,et al.  Voice puppetry , 1999, SIGGRAPH.

[12]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[13]  Keith Waters,et al.  A muscle model for animation three-dimensional facial expression , 1987, SIGGRAPH.

[14]  Hans-Peter Seidel,et al.  Speech Synchronization for Physics-Based Facial Animation , 2002, WSCG.

[15]  Demetri Terzopoulos,et al.  Physically-based facial modelling, analysis, and animation , 1990, Comput. Animat. Virtual Worlds.

[16]  Thomas Vetter,et al.  A morphable model for the synthesis of 3D faces , 1999, SIGGRAPH.

[17]  M. Argyle,et al.  Gaze and Mutual Gaze , 1994, British Journal of Psychiatry.

[18]  J. Burgoon,et al.  Nonverbal Communication , 2018, Encyclopedia of Evolutionary Psychological Science.

[19]  Roddy Cowie,et al.  Acoustic correlates of emotion dimensions in view of speech synthesis , 2001, INTERSPEECH.

[20]  M. Cranach,et al.  Human ethology : claims and limits of a new discipline : contributions to the Colloquium , 1982 .

[21]  Hans-Peter Seidel,et al.  Geometry-based Muscle Modeling for Facial Animation , 2001, Graphics Interface.

[22]  Jonas Beskow,et al.  Developing a 3D-agent for the august dialogue system , 1999, AVSP.

[23]  Parke,et al.  Parameterized Models for Facial Animation , 1982, IEEE Computer Graphics and Applications.

[24]  Michael M. Cohen,et al.  Modeling Coarticulation in Synthetic Visual Speech , 1993 .

[25]  Catherine Pelachaud,et al.  Performative facial expressions in animated faces , 2001 .

[26]  Demetri Terzopoulos,et al.  Constructing Physics-Based Facial Models of Individuals , 1993 .

[27]  Nick Campbell,et al.  Optimising selection of units from speech databases for concatenative synthesis , 1995, EUROSPEECH.

[28]  Thierry Dutoit,et al.  The MBROLA project: towards a set of high quality speech synthesizers free of use for non commercial purposes , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[29]  Brian Wyvill,et al.  Speech and expression: a computer solution to face animation , 1986 .

[30]  Nicole Chovil Discourse‐oriented facial displays in conversation , 1991 .

[31]  Mark Steedman,et al.  Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents , 1994, SIGGRAPH.

[32]  Keith Waters,et al.  Computer facial animation , 1996 .

[33]  N. Badler,et al.  Linguistic Issues in Facial Animation , 1991 .

[34]  Justine Cassell,et al.  BEAT: the Behavior Expression Animation Toolkit , 2001, Life-like characters.

[35]  Hans-Peter Seidel,et al.  Head shop: generating animated head models with anatomical structure , 2002, SCA '02.

[36]  Demetri Terzopoulos,et al.  Realistic modeling for facial animation , 1995, SIGGRAPH.

[37]  Marc Schröder,et al.  Emotional speech synthesis: a review , 2001, INTERSPEECH.

[38]  Petra Wagner,et al.  Speech synthesis development made easy: the bonn open synthesis system , 2001, INTERSPEECH.

[39]  Donald W. Fiske,et al.  Face-to-face interaction: Research, methods, and theory , 1977 .

[40]  Marc Schröder,et al.  The German Text-to-Speech Synthesis System MARY: A Tool for Research, Development and Teaching , 2003, Int. J. Speech Technol..

[41]  J. P. Lewis,et al.  Automated lip-synch and speech synthesis for character animation , 1987, CHI '87.

[42]  Daniel Thalmann,et al.  SMILE: A Multilayered Facial Animation System , 1991, Modeling in Computer Graphics.