Coordination and context-dependence in the generation of embodied conversation

We describe the generation of communicative actions in an implemented embodied conversational agent. Our agent plans each utterance so that multiple communicative goals may be realized opportunistically by a composite action including not only speech but also coverbal gesture that fits the context and the ongoing speech in ways representative of natural human conversation. We accomplish this by reasoning from a grammar which describes gesture declaratively in terms of its discourse function, semantics and synchrony with speech.

[1]  T. Trabasso,et al.  Offering a Hand to Pragmatic Understanding: The Role of Speech and Gesture in Comprehension and Memory , 1999 .

[2]  James C. Lester,et al.  Deictic and emotive communication in animated pedagogical agents , 2001 .

[3]  Thomas Rist,et al.  Employing AI Methods to Control the Behavior of Animated Interface Agents , 1999, Appl. Artif. Intell..

[4]  Robert Dale Generating referring expressions - constructing descriptions in a domain of objects and processes , 1992, ACL-MIT press series in natural language processing.

[5]  Richard A. Bolt,et al.  Two-handed gesture in multi-modal natural dialog , 1992, UIST '92.

[6]  Johanna D. Moore Participating in explanatory dialogues , 1994 .

[7]  Matthew Stone,et al.  Sentence Planning as Description Using Tree Adjoining Grammar , 1997, ACL.

[8]  Johanna D. Moore,et al.  A Media-Independent Content Language for Integrated Text and Graphics Generation , 1998 .

[9]  L A Thompson,et al.  Evaluation and integration of speech and pointing gestures during referential understanding. , 1986, Journal of experimental child psychology.

[10]  D. McNeill Hand and Mind: What Gestures Reveal about Thought , 1992 .

[11]  A. Kendon Movement coordination in social interaction: some examples described. , 1970, Acta psychologica.

[12]  Michael Strube,et al.  Never Look Back: An Alternative to Centering , 1998, COLING-ACL.

[13]  Mark Steedman,et al.  Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents , 1994, SIGGRAPH.

[14]  Jeanette K. Gundel,et al.  Cognitive Status and the Form of Referring Expressions in Discourse , 1993 .

[15]  Matthew Stone,et al.  Lexicalized grammar and the description of motion events , 2000, TAG+.

[16]  Kristinn R. Thórisson,et al.  Integrating Simultaneous Input from Speech, Gaze, and Hand Gestures , 1991, AAAI Workshop on Intelligent Multimedia Interfaces.

[17]  Wolfgang Wahlster,et al.  Designing Illustrated Texts: How Language Production Is Influenced by Graphics Generation , 1991, EACL.

[18]  W. Rogers,et al.  THE CONTRIBUTION OF KINESIC ILLUSTRATORS TOWARD THE COMPREHENSION OF VERBAL BEHAVIOR WITHIN UTTERANCES , 1978 .

[19]  E. Prince The ZPG Letter: Subjects, Definiteness, and Information-status , 1992 .

[20]  Mark Steedman Structure and Intonation , 1991 .

[21]  Ellen F. Prince,et al.  On the Syntactic Marking of Presupposed Open Propositions , 1986 .

[22]  Kristinn R. Thórisson,et al.  The Power of a Nod and a Glance: Envelope Vs. Emotional Feedback in Animated Conversational Agents , 1999, Appl. Artif. Intell..

[23]  Justine Cassell,et al.  Embodied conversational interface agents , 2000, CACM.

[24]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[25]  Steven K. Feiner,et al.  Automating the generation of coordinated multimedia explanations , 1991, Computer.

[26]  Sharon L. Oviatt,et al.  Unification-based Multimodal Integration , 1997, ACL.

[27]  J. Cassell,et al.  Nudge nudge wink wink: elements of face-to-face conversation for embodied conversational agents , 2001 .

[28]  Hao Yan Paired Speech and Gesture Generation in Embodied Conversational Agents , 2000 .

[29]  Douglas E. Appelt,et al.  Planning English Sentences , 1988, Cogn. Sci..

[30]  D. McNeill,et al.  Speech-gesture mismatches: Evidence for one underlying representation of linguistic and nonlinguistic information , 1998 .

[31]  Giuseppe Carenini,et al.  Content Visualisation and Intermedia Representations (CVIR'98 , 1998 .

[32]  Hao Yan,et al.  Paired speech and gesture generation in embodied conversational agents , 2000 .

[33]  Aravind K. Joshi,et al.  Mathematical and computational aspects of lexicalized grammars , 1990 .

[34]  W. Lewis Johnson,et al.  Animated Agents for Procedural Training in Virtual Reality: Perception, Cognition, and Motor Control , 1999, Appl. Artif. Intell..

[35]  Aravind K. Joshi,et al.  Tree Adjunct Grammars , 1975, J. Comput. Syst. Sci..

[36]  Michael Johnston,et al.  Unification-based Multimodal Parsing , 1998, ACL.

[37]  David R. Traum,et al.  Discourse Obligations in Dialogue Processing , 1994, ACL.

[38]  Sharon L. Oviatt,et al.  Predicting spoken disfluencies during human-computer interaction , 1995, Comput. Speech Lang..