论文信息 - "May I talk to you? : -) " - facial animation from text

"May I talk to you? : -) " - facial animation from text

We introduce a facial animation system that produces real-time animation sequences including speech synchronization and non-verbal speech-related facial expressions from plain text input. A state-of-the-art text-to-speech synthesis component performs linguistic analysis of the text input and creates a speech signal from phonetic and intonation information. The phonetic transcription is additionally used to drive a speech synchronization method for the physically based facial animation. Further high-level information from the linguistic analysis such as different types of accents or pauses as well as the type of the sentence is used to generate non-verbal speech-related facial expressions such as movement of head, eyes, and eyebrows or voluntary eye blinks. Moreover, emotions are translated into XML markup that triggers emotional facial expressions.

[1] Gérard Bailly,et al. Audiovisual Speech Synthesis , 2003, Int. J. Speech Technol..

[2] David B. Pisoni,et al. Text-to-speech: the mitalk system , 1987 .

[3] Mark Steedman,et al. Generating Facial Expressions for Speech , 1996, Cogn. Sci..

[4] Matthew Stone,et al. Making discourse visible: coding and animating conversational facial displays , 2002, Proceedings of Computer Animation 2002 (CA 2002).

[5] Roxane Bertrand,et al. About the relationship between eyebrow movements and Fo variations , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6] Horace Ho-Shing Ip,et al. Script-based facial gesture and speech animation using a NURBS based face model , 1996, Comput. Graph..

[7] Paul Taylor,et al. Festival Speech Synthesis System , 1998 .

[8] Irene Albrecht,et al. Automatic Generation of Non-Verbal Facial Expressions from Speech , 2002 .

[9] Wojciech Skut,et al. Chunk Tagger - Statistical Recognition of Noun Phrases , 1998, ArXiv.

[10] Björn Granström,et al. Timing and interaction of visual cues for prominence in audiovisual speech perception , 2001, INTERSPEECH.

[11] Matthew Brand,et al. Voice puppetry , 1999, SIGGRAPH.

[12] Thorsten Brants,et al. TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[13] Keith Waters,et al. A muscle model for animation three-dimensional facial expression , 1987, SIGGRAPH.

[14] Hans-Peter Seidel,et al. Speech Synchronization for Physics-Based Facial Animation , 2002, WSCG.

[15] Demetri Terzopoulos,et al. Physically-based facial modelling, analysis, and animation , 1990, Comput. Animat. Virtual Worlds.

[16] Thomas Vetter,et al. A morphable model for the synthesis of 3D faces , 1999, SIGGRAPH.

[17] M. Argyle,et al. Gaze and Mutual Gaze , 1994, British Journal of Psychiatry.

[18] J. Burgoon,et al. Nonverbal Communication , 2018, Encyclopedia of Evolutionary Psychological Science.

[19] Roddy Cowie,et al. Acoustic correlates of emotion dimensions in view of speech synthesis , 2001, INTERSPEECH.

[20] M. Cranach,et al. Human ethology : claims and limits of a new discipline : contributions to the Colloquium , 1982 .

[21] Hans-Peter Seidel,et al. Geometry-based Muscle Modeling for Facial Animation , 2001, Graphics Interface.

[22] Jonas Beskow,et al. Developing a 3D-agent for the august dialogue system , 1999, AVSP.

[23] Parke,et al. Parameterized Models for Facial Animation , 1982, IEEE Computer Graphics and Applications.

[24] Michael M. Cohen,et al. Modeling Coarticulation in Synthetic Visual Speech , 1993 .

[25] Catherine Pelachaud,et al. Performative facial expressions in animated faces , 2001 .

[26] Demetri Terzopoulos,et al. Constructing Physics-Based Facial Models of Individuals , 1993 .

[27] Nick Campbell,et al. Optimising selection of units from speech databases for concatenative synthesis , 1995, EUROSPEECH.

[28] Thierry Dutoit,et al. The MBROLA project: towards a set of high quality speech synthesizers free of use for non commercial purposes , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[29] Brian Wyvill,et al. Speech and expression: a computer solution to face animation , 1986 .

[30] Nicole Chovil. Discourse‐oriented facial displays in conversation , 1991 .

[31] Mark Steedman,et al. Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents , 1994, SIGGRAPH.

[32] Keith Waters,et al. Computer facial animation , 1996 .

[33] N. Badler,et al. Linguistic Issues in Facial Animation , 1991 .

[34] Justine Cassell,et al. BEAT: the Behavior Expression Animation Toolkit , 2001, Life-like characters.

[35] Hans-Peter Seidel,et al. Head shop: generating animated head models with anatomical structure , 2002, SCA '02.

[36] Demetri Terzopoulos,et al. Realistic modeling for facial animation , 1995, SIGGRAPH.

[37] Marc Schröder,et al. Emotional speech synthesis: a review , 2001, INTERSPEECH.

[38] Petra Wagner,et al. Speech synthesis development made easy: the bonn open synthesis system , 2001, INTERSPEECH.

[39] Donald W. Fiske,et al. Face-to-face interaction: Research, methods, and theory , 1977 .

[40] Marc Schröder,et al. The German Text-to-Speech Synthesis System MARY: A Tool for Research, Development and Teaching , 2003, Int. J. Speech Technol..

[41] J. P. Lewis,et al. Automated lip-synch and speech synthesis for character animation , 1987, CHI '87.

[42] Daniel Thalmann,et al. SMILE: A Multilayered Facial Animation System , 1991, Modeling in Computer Graphics.