Synthesizing Cooperative Conversation

We describe an implemented system which automatically generates and animates conversations between multiple human-like agents with appropriate and synchronized speech, intonation, facial expressions, and hand gestures. Conversations are created by a dialogue planner that produces the text as well as the intonation of the utterances. The speaker/listener relationship, the text, and the intonation in turn drive facial expressions, lip motions, eye gaze, head motion, and arm gesture generators.

[1]  A. Bruce Emotional Expression , 1883, The American Naturalist.

[2]  D. Crystal,et al.  Intonation and Grammar in British English , 1967 .

[3]  Michael Halliday,et al.  Intonation and Grammar in British English , 1967 .

[4]  A. Kendon Movement coordination in social interaction: some examples described. , 1970, Acta psychologica.

[5]  S. Duncan,et al.  Some Signals and Rules for Taking Speaking Turns in Conversations , 1972 .

[6]  P. Ekman Movements with Precise Meanings , 1976 .

[7]  M. Argyle,et al.  Gaze and Mutual Gaze , 1994, British Journal of Psychiatry.

[8]  J. Lyons Semantics: Index of personal names , 1977 .

[9]  G. Beattie Sequential Temporal Patterns of Speech and Gaze in Dialogue , 1978 .

[10]  R. Power The organisation of purposeful dialogues , 1979 .

[11]  A. Kendon Gesticulation and Speech: Two Aspects of the Process of Utterance , 1981 .

[12]  Parke,et al.  Parameterized Models for Facial Animation , 1982, IEEE Computer Graphics and Applications.

[13]  U. Hadar,et al.  Head Movement Correlates of Juncture and Stress at Sentence Level , 1983, Language and speech.

[14]  Howard Poizner,et al.  Computer graphic modeling of american sign language , 1983, SIGGRAPH.

[15]  U. Hadar,et al.  The Relationship Between Head Movements and Speech Dysfluencies , 1984, Language and speech.

[16]  J. Terken The Distribution of Pitch Accents in Instructions as a Function of Discourse Structure , 1984 .

[17]  U. Hadar,et al.  Head movement during listening turns in conversation , 1985 .

[18]  Brian Wyvill,et al.  Speech and expression: a computer solution to face animation , 1986 .

[19]  Stephen Isard,et al.  Why to speak, what to say and how to say it: Modelling language production in discourse. , 1987 .

[20]  Julia Hirschberg,et al.  Assigning Intonational Features in Synthesized Spoken Directions , 1988, ACL.

[21]  Petr Sgall,et al.  Topic and Focus of a Sentence and the Patterning of a Text , 1988 .

[22]  Robert Dale,et al.  Generating referring expressions in a domain of objects and processes (language representation) , 1988 .

[23]  Eduard H. Hovy,et al.  Planning Coherent Multisentential Text , 1988, ACL.

[24]  Johanna D. Moore,et al.  Planning Text for Advisory Dialogues , 1989, ACL.

[25]  Daniel Thalmann,et al.  Simulation of object and human skin formations in a grasping task , 1989, SIGGRAPH.

[26]  D. Bolinger Intonation and Its Uses , 1989 .

[27]  Steven K. Feiner,et al.  Generating coordinated multimedia explanations , 1990, Sixth Conference on Artificial Intelligence for Applications.

[28]  Norman I. Badler,et al.  Strength guided motion , 1990, SIGGRAPH.

[29]  Nadia Magnenat-Thalmann,et al.  Human body deformations using joint-dependent local operators and finite-element theory , 1991 .

[30]  Gertjan van Noord,et al.  Semantic-Head-Driven Generation , 1990, Comput. Linguistics.

[31]  M. Meteer Bridging the generation gap between text planning and linguistic realization , 1991 .

[32]  N. Badler,et al.  Linguistic Issues in Facial Animation , 1991 .

[33]  Manjula Patel,et al.  FACES: Facial Animation, Construction and Editing System , 1991, Eurographics.

[34]  Mark Steedman Structure and Intonation , 1991 .

[35]  Michael Girard,et al.  Computer animation of knowledge-based human grasping , 1991, SIGGRAPH.

[36]  Thomas W. Calvert,et al.  Composition of realistic animation sequences for multiple human figures , 1991 .

[37]  Daniel Thalmann,et al.  SMILE: A Multilayered Facial Animation System , 1991, Modeling in Computer Graphics.

[38]  E. André,et al.  WIP: The Coordinated Generation of Multimodal Presentations from a Common Representation , 1992 .

[39]  Tosiyasu L. Kunii,et al.  Visual translation: from native language to sign language , 1992, Proceedings IEEE Workshop on Visual Languages.

[40]  E. Prince The ZPG Letter: Subjects, Definiteness, and Information-status , 1992 .

[41]  Mark Steedman,et al.  Generating Contextually Appropriate Intonation , 1993, EACL.

[42]  Michael M. Cohen,et al.  Modeling Coarticulation in Synthetic Visual Speech , 1993 .

[43]  Norman I. Badler,et al.  Simulating humans: computer graphics animation and control , 1993 .

[44]  Mark Steedman,et al.  Using context to specify intonation in speech synthesis , 1993, EUROSPEECH.

[45]  C. Guinn A Computational Model of Dialogue Initiative in Collaborative Discourse , 1993 .

[46]  Joseph Rosen,et al.  The virtual sailor: An implementation of interactive human body modeling , 1993, Proceedings of IEEE Virtual Reality Annual International Symposium.

[47]  Alan W. Biermann,et al.  Efficient Collaborative Discourse: A Theory and Its Implementation , 1993, HLT.

[48]  Marilyn A. Walker,et al.  Informational redundancy and resource bounds in dialogue , 1993 .

[49]  Akikazu Takeuchi,et al.  Communicative facial displays as a new conversational modality , 1993, INTERCHI.

[50]  Demetri Terzopoulos,et al.  Analysis and Synthesis of Facial Image Sequences Using Physical and Anatomical Models , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Matthew Stone,et al.  Modeling the Interaction between Speech and Gesture. , 1994 .

[52]  Mark Steedman,et al.  Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents , 1994, SIGGRAPH.

[53]  Ehud Reiter,et al.  Has a Consensus NL Generation Architecture Appeared, and is it Psycholinguistically Plausible? , 1994, INLG.

[54]  Alex Pentland,et al.  A vision system for observing and extracting facial action parameters , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Mark Steedman,et al.  Specifying intonation from context for speech synthesis , 1994, Speech Communication.

[56]  Welton Becket The Jack Lisp API Version 1.1 , 1994 .

[57]  C. Creider Hand and Mind: What Gestures Reveal about Thought , 1994 .

[58]  Demetri Terzopoulos,et al.  Realistic modeling for facial animation , 1995, SIGGRAPH.

[59]  H. Bekkering,et al.  The gap effect for eye and hand movements , 1996, Perception & psychophysics.

[60]  Mark Steedman,et al.  Generating Facial Expressions for Speech , 1996, Cogn. Sci..