Generating Facial Expressions for Speech

This article reports results from o program thot produces high-quolity onimotion of fociol expressions ond head movements OS outomoticolly OS possible in conjunction with meaning-based speech synthesis, including spoken intonation. The gool of the research is OS much to test and define our theories of the formal semantics for such gestures, OS to produce convincing onimotion. Towards this end, we hove produced o high-level progromming longuoge for three-dimensional (3-D) onimotion of fociol expressions. We have been concerned primorily with expressions conveying information correlated with the intonotion of the voice: This includes the differences of timing, pitch, and emphosis that ore reloted to such semantic distinctions of discourse OS “focus,” “topic,” and “comment, ” “theme” ond “rheme,” or “given” ond “new” informotion. We ore also interested in the relotion of affect or emotion to fociol expression. Until now, systems hove not embodied such rule-governed tronslotion from spoken utterance meaning to fociol expressions. Our system embodies rules that describe and coordinate these relations: intonotion/informofion, intonofion/offect, ond fociol expressions/affect. A meoning representation includes discourse information: What is controstive/background informotion in the given context, ond whot is the “topic” or “theme” of the discourse? The system mops the meaning representotion into how accents ond their placement ore chosen, how they ore conveyed over fociol expression, ond how speech ond fociol expressions ore coordinated. This determines a sequence of functional groups: lip shapes, conversational signals, punctuators, regulators, and monipulotors. Our algorithms then impose synchrony, create coorticulotion effects, and determine offectuol signals, eye ond heod movements. The lowest level representation is the Facial Action Coding System (FACS), which makes the generation system portable to other fociol models. We would like to thank Steve Platt for his facial model and for very useful comments. We would like to thank Soetjianto and Khairol Yussof who have improved the facial model. We are also very grateful to Jean Griffin, Francisco Azuola, and Mike Edwards who developed part of the animation software. All the work related to the voice synthesizer, speech, and intonation was done by Scott Prevost. We are very grateful to him. Finally, we would like to thank all the members of the graphics laboratory, especially Cary Phillips and Jianmin Zhao, for their helpful comments.

[1]  P. Ekman Movements with Precise Meanings , 1976 .

[2]  Lance Williams,et al.  Performance-driven facial animation , 1990, SIGGRAPH.

[3]  J. Burgoon,et al.  Nonverbal Communication , 2018, Encyclopedia of Evolutionary Psychological Science.

[4]  Catherine Pelachaud,et al.  Rule-Structured Facial Animation System , 1993, IJCAI.

[5]  Raymond D. Kent,et al.  Coarticulation in recent speech production models , 1977 .

[6]  J. Pierrehumbert The phonology and phonetics of English intonation , 1987 .

[7]  Janet E. Cahn Generating expression in synthesized speech , 1989 .

[8]  E H Hess,et al.  The role of pupil size in communication. , 1975, Scientific American.

[9]  Demetri Terzopoulos,et al.  Techniques for Realistic Facial Modeling and Animation , 1991 .

[10]  C. Pelachaud Communication and coarticulation in facial animation , 1992 .

[11]  M. Argyle,et al.  Gaze and Mutual Gaze , 1994, British Journal of Psychiatry.

[12]  S. Duncan,et al.  Some Signals and Rules for Taking Speaking Turns in Conversations , 1972 .

[13]  J. P. Lewis,et al.  Automated lip-synch and speech synthesis for character animation , 1987, CHI '87.

[14]  Norman I. Badler,et al.  Final Report to Nsf of the Standards for Facial Animation Workshop Final Report to Nsf of the Standards for Facial Animation Workshop , 1994 .

[15]  Daniel Thalmann,et al.  The Direction of Synthetic Actors in the Film Rendez-Vous a Montreal , 1987, IEEE Computer Graphics and Applications.

[16]  Parke,et al.  Parameterized Models for Facial Animation , 1982, IEEE Computer Graphics and Applications.

[17]  Alex Pentland,et al.  A vision system for observing and extracting facial action parameters , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Allen T. Dittmann,et al.  Chapter 7 – The Body Movement-Speech Rhythm Relationship as a Cue to Speech Encoding , 1972 .

[19]  Marie-Luce Viaud,et al.  Facial animation with wrinkles , 1992 .

[20]  A. Bruce Emotional Expression , 1883, The American Naturalist.

[21]  Uri Hadar,et al.  Kinematics of head movements accompanying speech during conversation , 1983 .

[22]  Demetri Terzopoulos,et al.  Realistic modeling for facial animation , 1995, SIGGRAPH.

[23]  Thomas W. Calvert,et al.  Composition of realistic animation sequences for multiple human figures , 1991 .

[24]  Munetoshi Unuma,et al.  Generation of Human Motion with Emotion , 1991 .

[25]  Catherine Pelachaud,et al.  Modeling and animating the human tongue during speech production , 1994, Proceedings of Computer Animation '94.

[26]  Tsuneya Kurihara,et al.  A Transformation Method for Modeling and Animation of the Human Face from Photographs , 1991 .

[27]  M. Cranach,et al.  Human ethology : claims and limits of a new discipline : contributions to the Colloquium , 1982 .

[28]  Stephen Michael Platt,et al.  A structural model of the human face (graphics, animation, object representation) , 1985 .

[29]  Daniel Thalmann,et al.  SMILE: A Multilayered Facial Animation System , 1991, Modeling in Computer Graphics.

[30]  Arthur N. Wiens,et al.  Nonverbal Communication: The State of the Art , 1978 .

[31]  Christian Abry,et al.  Nineteen (±two) French visemes for visual speech synthesis , 1990, SSW.

[32]  Mark Steedman,et al.  Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents , 1994, SIGGRAPH.

[33]  Julia Hirschberg,et al.  The intonational Structuring of Discourse , 1986, ACL.

[34]  Kim E. A. Silverman,et al.  Vocal cues to speaker affect: testing two models , 1984 .

[35]  Elisabeth Selkirk,et al.  Phonology and syntax , 1984 .

[36]  Norman I. Badler,et al.  Simulating humans: computer graphics animation and control , 1993 .

[37]  Mark Steedman,et al.  Information Based Intonation Synthesis , 1994, HLT.

[38]  Norman I. Badler,et al.  Making Them Move: Mechanics, Control & Animation of Articulated Figures , 1990 .

[39]  Tosiyasu L. Kunii,et al.  Modeling in Computer Graphics , 1991 .

[40]  Keith Waters,et al.  A muscle model for animation three-dimensional facial expression , 1987, SIGGRAPH.

[41]  D. Crystal,et al.  Intonation and Grammar in British English , 1967 .

[42]  Michael M. Cohen,et al.  Modeling Coarticulation in Synthetic Visual Speech , 1993 .

[43]  Mark Steedman,et al.  Specifying intonation from context for speech synthesis , 1994, Speech Communication.

[44]  N. Badler,et al.  Linguistic Issues in Facial Animation , 1991 .

[45]  Mark Steedman Structure and Intonation , 1991 .