论文信息 - Evaluating the impact of variation in automatically generated embodied object descriptions

Evaluating the impact of variation in automatically generated embodied object descriptions

The primary task for any system that aims to automatically generate human-readable output is choice: the input to the system is usually well-specified, but there can be a wide range of options for creating a presentation based on that input. When designing such a system, an important decision is to select which aspects of the output are hard-wired and which allow for dynamic variation. Supporting dynamic choice requires additional representation and processing effort in the system, so it is important to ensure that incorporating variation has a positive effect on the generated output. In this thesis, we concentrate on two types of output generated by a multimodal dialogue system: linguistic descriptions of objects drawn from a database, and conversational facial displays of an embodied talking head. In a series of experiments, we add different types of variation to one of these types of output. The impact of each implementation is then assessed through a user evaluation in which human judges compare outputs generated by the basic version of the system to those generated by the modified version; in some cases, we also use automated metrics to compare the versions of the generated output. This series of implementations and evaluations allows us to address three related issues. First, we explore the circumstances under which users perceive and appreciate variation in generated output. Second, we compare two methods of including variation into the output of a corpus-based generation system. Third, we compare human judgements of output quality to the predictions of a range of automated metrics. The results of the thesis are as follows. The judges generally preferred output that incorporated variation, except for a small number of cases where other aspects of the output obscured it or the variation was not marked. In general, the output of systems that chose the majority option was judged worse than that of systems that chose from a wider range of outputs. However, the results for non-verbal displays were mixed: users mildly preferred agent outputs where the facial displays were generated using stochastic techniques to those where a simple rule was used, but the stochastic facial displays decreased users’ ability to identify contextual tailoring in speech while the rule-based displays did not. Finally, automated metrics based on simple corpus similarity favour generation strategies that do not diverge far from the average corpus examples, which are exactly the strategies that human judges tend to dislike. Automated metrics that measure other properties of the generated output correspond more closely to users’ preferences.

Mary Ellen Foster | M. Foster

[1] R. Davidson,et al. The functional neuroanatomy of emotion and affective style , 1999, Trends in Cognitive Sciences.

[2] Amy Isard,et al. Multi-lingual Evaluation of a Natural Language Generation System , 2004, LREC.

[3] Chrysanne Dimarco,et al. HealthDoc: Customizing patient information and health education by medical condition and personal characteristics , 2007 .

[4] Ingrid Zukerman,et al. Consulting a user model to address a user's inferences during content planning , 1993, User Modeling and User-Adapted Interaction.

[5] Mary Ellen Foster,et al. The automated generation of Web documents that are tailored to the individual reader , 1997 .

[6] M. Swerts,et al. MORE ABOUT BROWS A Cross-Linguistic Study via Analysis-by-Synthesis , 2004 .

[7] Paul Piwek,et al. Modality Choice for Generation of Referring Acts: Pointing versus Describing , 2007 .

[8] Catherine Pelachaud,et al. From Discourse Plans to Believable Behavior Generation , 2002, INLG.

[9] Stefanie Shattuck-Hufnagel,et al. The original ToBI system and the evolution of the ToBI framework , 2003 .

[10] Boris E. R. de Ruyter,et al. Assessing the effects of building social intelligence in a robotic interface for the home , 2005, Interact. Comput..

[11] Steven K. Feiner,et al. PERSIVAL, a system for personalized search and summarization over multimedia healthcare information , 2001, JCDL '01.

[12] Ehud Reiter,et al. Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[13] Candace L. Sidner,et al. Explorations in engagement for humans and robots , 2005, Artif. Intell..

[14] R. Kibble. Cb or not Cb? Centering theory applied to NLG , 1999 .

[15] Chris Mellish,et al. Evaluation in the context of natural language generation , 1998, Comput. Speech Lang..

[16] M. Alibali,et al. Transitions in concept acquisition: using the hand to read the mind. , 1993, Psychological review.

[17] P. Keating,et al. Optical Phonetics and Visual Perception of Lexical and Phrasal Stress in English , 2009, Language and speech.

[18] Michael Elhadad,et al. Using argumentation in text generation , 1995 .

[19] Irene Langkilde-Geary,et al. An Empirical Verification of Coverage and Correctness for a General-Purpose Sentence Generator , 2002, INLG.

[20] Michael Neff,et al. An annotation scheme for conversational gestures: how to economically capture timing and form , 2007, Lang. Resour. Evaluation.

[21] Eduard H. Hovy,et al. On the Knowledge Underlying Multimedia Presentations , 1991, AAAI Workshop on Intelligent Multimedia Interfaces.

[22] Kathleen McKeown,et al. Tailoring Lexical Choice to the User's Vocabulary in Multimedia Explanation Generation , 1993, ACL.

[23] Somayajulu Sripada,et al. Summarizing Dive Computer Data: A Case Study in Integrating Textual and Graphical Presentations of Numerical Data , 2007 .

[24] M. Swerts,et al. Congruent and incongruent audiovisual cues to prominence , 2004, Speech Prosody 2004.

[25] Heinrich H. Bülthoff,et al. Facial Animation Based on 3D Scans and Motion Capture , 2003, SIGGRAPH 2003.

[26] Lidija Iordanskaja,et al. Content determination and text structuring; two interrelated processes , 1993 .

[27] Mark Steedman,et al. Information Structure and the Syntax-Phonology Interface , 2000, Linguistic Inquiry.

[28] Jon Oberlander,et al. Source authoring for multilingual generation of personalised object descriptions , 2006, Natural Language Engineering.

[29] Ehud Reiter,et al. Evaluation of an NLG System using Post-Edit Data: Lessons Learnt , 2005, ENLG.

[30] Srinivas Bangalore,et al. Evaluation Metrics for Generation , 2000, INLG.

[31] Simon King,et al. Festival 2 - build your own general purpose unit selection speech synthesiser , 2004, SSW.

[32] Jean Carletta,et al. The NITE XML Toolkit: Data Model and Query Language , 2005, Lang. Resour. Evaluation.

[33] M. Strube,et al. Using an Annotated Corpus As a Knowledge Source For Language Generation , 2005 .

[34] Jlfnm Fpoli,et al. Training a Sentence Planner for Spoken Dialogue Using Boosting , 2002 .

[35] Volker Strom,et al. Visual prosody: facial movements accompanying speech , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[36] A. Kendon. Gesture: Visible Action as Utterance , 2004 .

[37] Ralf Engel,et al. SPIN: language understanding for spoken dialogue systems using a production system approach , 2002, INTERSPEECH.

[38] Roxane Bertrand,et al. About the relationship between eyebrow movements and Fo variations , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[39] Elisabeth André,et al. Catch me if you can: exploring lying agents in social settings , 2005, AAMAS '05.

[40] Matthew Stone,et al. Formal Semantics for Iconic Gesture , 2006 .

[41] Kevin Knight,et al. Generation that Exploits Corpus-Based Statistical Knowledge , 1998, ACL.

[42] J. Oberlander,et al. Using Facial Feedback to Enhance Turn-Taking in a Multimodal Dialogue System , 2005 .

[43] Chris Mellish,et al. The RAGS reference manual , 2002 .

[44] Donia Scott,et al. Document Structure , 2003, CL.

[45] Anja Belz,et al. GENEVAL: A Proposal for Shared-task Evaluation in NLG , 2006, INLG.

[46] Johanna D. Moore,et al. Generating Tailored, Comparative Descriptions in Spoken Dialogue , 2004, FLAIRS Conference.

[47] J. Cassell,et al. Embodied conversational agents , 2000 .

[48] Siegfried Benkner,et al. Final Evaluation Report of HPF/SX , 2001 .

[49] Kim Binsted,et al. Generating Personalised Patient Information Using the Medical Record , 1995, AIME.

[50] Mark Steedman,et al. APML, a Markup Language for Believable Behavior Generation , 2004, Life-like characters.

[51] Kees van Deemter,et al. Generating Multimedia Presentations from Plain Text to Screen Play , 2005 .

[52] Christopher Habel,et al. On Reciprocal Improvement in Multimodal Generation : Co-reference by Text and Information Graphics * , 2007 .

[53] Tamim Asfour,et al. A cognitive architecture for a humanoid robot: a first approach , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[54] Ipke Wachsmuth,et al. Max - A Multimodal Assistant in Virtual Reality Construction , 2003, Künstliche Intell..

[55] Mark T. Maybury,et al. Research in multimedia and multimodal parsing and generation , 1995, Artificial Intelligence Review.

[56] Michael White,et al. Reining in CCG Chart Realization , 2004, INLG.

[57] Kathleen F. McCoy. Reasoning on a Highlighted User Model to Respond to Misconceptions , 1988, Comput. Linguistics.

[58] O. Fujimura,et al. Articulatory Correlates of Prosodic Control: Emotion and Emphasis , 1998, Language and speech.

[59] Kallirroi Georgila,et al. An ISU Dialogue System Exhibiting Reinforcement Learning of Dialogue Policies: Generic Slot-Filling in the TALK In-car System , 2006, EACL.

[60] Matthew Marge,et al. Evaluating Evaluation Methods for Generation in the Presence of Variation , 2005, CICLing.

[61] Elaine Rich,et al. User Modeling via Stereotypes , 1998, Cogn. Sci..

[62] W. Gaebel,et al. Facial expressivity in the course of schizophrenia and depression , 2004, European Archives of Psychiatry and Clinical Neuroscience.

[63] Nicole Chovil. Discourse‐oriented facial displays in conversation , 1991 .

[64] Gregor Hofer,et al. Analysis and Synthesis of Head Motion for Lifelike Conversational Agents , 2005 .

[65] Scott Weinstein,et al. Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.

[66] Davide Fossati,et al. Aggregation Improves Learning: Experiments in Natural Language Generation for Intelligent Tutoring Systems , 2005, ACL.

[67] Alison Cawsey,et al. The Evaluation of a Personalised Health Information System for Patients with Cancer , 2000, User Modeling and User-Adapted Interaction.

[68] Kathleen McKeown,et al. Text generation: using discourse strategies and focus constraints to generate natural language text , 1985 .

[69] Heinrich H. Bülthoff,et al. The components of conversational facial expressions , 2004, APGV '04.

[70] Jens Edlund,et al. Constraint Manipulation and Visualization in a Multimodal Dialogue System , 2002 .

[71] Donia Scott,et al. Structural variation in generated health reports , 2005, IWP@IJCNLP.

[72] J. Bavelas,et al. Visible Acts of Meaning , 2000 .

[73] Harry Bunt,et al. Multimodal referece. Studies in automatic generation of multimodal referring expressions , 2000 .

[74] Don WILLEMS,et al. Features for mode detection in natural online pen input , 2005 .

[75] Michael White,et al. Synthesising contextually appropriate intonation in limited domains , 2004, SSW.

[76] Matthew Stone,et al. Specifying and animating facial signals for discourse in embodied conversational agents , 2004, Comput. Animat. Virtual Worlds.

[77] Alex Lascarides,et al. Logics of Conversation , 2005, Studies in natural language processing.

[78] Marilyn A. Walker,et al. Evaluating Dialogue Strategies in Multimodal Dialogue Systems , 2005 .

[79] Roberta Catizone,et al. Multimodal Generation in the COMIC Dialogue System , 2005, ACL.

[80] Catherine Pelachaud,et al. Performative facial expressions in animated faces , 2001 .

[81] James F. Allen,et al. Toward Conversational Human-Computer Interaction , 2001, AI Mag..

[82] David P. Wilkins. Why pointing with the index finger is not a universal (in sociocultural and semiotic terms). , 2003 .

[83] Kim Binsted,et al. Children's evaluation of computer-generated punning riddles , 1997 .

[84] Noémie Elhadad,et al. Facilitating Physicians' Access to Information via Tailored Text Summarization , 2005, AMIA.

[85] Sharon L. Oviatt,et al. Ten myths of multimodal interaction , 1999, Commun. ACM.

[86] Ipke Wachsmuth,et al. Deictic object reference in task-oriented dialogue , 2006 .

[87] Robert Dale,et al. Handbook of Natural Language Processing , 2001, Computational Linguistics.

[88] Yvonne Freeh,et al. An R and S–PLUS Companion to Applied Regression , 2004 .

[89] Daniel S. Paiva,et al. In search of a reference architecture for NLG systemsLynne , 1999 .

[90] Johanna D. Moore,et al. Generating and evaluating evaluative arguments , 2006, Artif. Intell..

[91] Srinivas Bangalore,et al. Corpus-Based Lexical Choice in Natural Language Generation , 2000, ACL.

[92] Claus Zinn,et al. Intelligent Information Presentation for Tutoring Systems , 2005 .

[93] Amy Isard,et al. Individuality and Alignment in Generated Dialogues , 2006, INLG.

[94] Mary Ellen Foster,et al. Techniques for Text Planning with XSLT , 2004, NLPXML@ACL.

[95] Marilyn A. Walker. Can We Talk? Methods for Evaluation and Training of Spoken Dialogue Systems , 2005, Lang. Resour. Evaluation.

[96] Chris Mellish,et al. An Empirical Study on the Generation of Anaphora in Chinese , 1997, Comput. Linguistics.

[97] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[98] George N. Votsis,et al. Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[99] Emiel Krahmer,et al. Squibs and Discussions: Real versus Template-Based Natural Language Generation: A False Opposition? , 2005, CL.

[100] Jacques M. B. Terken,et al. Investigating the relationship between the personality of a robotic TV assistant and the level of user control , 2006, ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication.

[101] Yukiko I. Nakano,et al. Non-Verbal Cues for Discourse Structure , 2022 .

[102] Catherine Pelachaud,et al. Analysis of gesture expressivity modulations from cartoons animations , 2006 .

[103] Amy Isard,et al. Speaking the Users' Languages , 2003, IEEE Intell. Syst..

[104] Facial expressions of pitch structure , 2006 .

[105] Alan W. Black,et al. Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[106] Siobhan Devlin,et al. Simplifying Text for Language-Impaired Readers , 1999, EACL.

[107] Justine Cassell,et al. BEAT: the Behavior Expression Animation Toolkit , 2001, Life-like characters.

[108] Marilyn A. Walker. Share and Share Alike: Resources for Language Generation , 2006 .

[109] Mitsuru Ishizuka,et al. Understanding the effect of life-like interface agents through users' eye movements , 2005, ICMI '05.

[110] Yorick Wilks,et al. Where Am I Coming From: The Reversibility of Analysis and Generation in Natural Language Processing , 2003 .

[111] Chris Mellish,et al. A Reference Architecture for Natural Language Generation Systems , 2006, Natural Language Engineering.

[112] Mi chae. Optimising Text Quality in Generation From Relational Databases , 2002 .

[113] Nikiforos Karamanis,et al. Entity coherence for descriptive text structuring , 2004 .

[114] Emiel Krahmer,et al. Efficient context-sensitive generation of referring expressions , 2002 .

[115] Nadia Magnenat-Thalmann,et al. Generic personality and emotion simulation for conversational agents , 2004, Comput. Animat. Virtual Worlds.

[116] Jeff A. Bilmes,et al. Factored Language Models and Generalized Parallel Backoff , 2003, NAACL.

[117] Alois Knoll,et al. Human-Robot dialogue for joint construction tasks , 2006, ICMI '06.

[118] G E R D H E R Z O G,et al. Large-scale software integration for spoken language and multimodal dialog systems , 2004 .

[119] J. Cassell,et al. Nudge nudge wink wink: elements of face-to-face conversation for embodied conversational agents , 2001 .

[120] John A. Bateman,et al. Towards Constructive Text, Diagram, and Layout Generation for Information Presentation , 2001, Computational Linguistics.

[121] Radoslaw Niewiadomski,et al. Perception of Blended Emotions: From Video Corpus to Expressive Agent , 2006, IVA.

[122] Robert T. Clemen,et al. Making Hard Decisions: An Introduction to Decision Analysis , 1997 .

[123] Johanna D. Moore,et al. Information Presentation in Spoken Dialogue Systems , 2006, EACL.

[124] Steven K. Feiner,et al. Automating the generation of coordinated multimedia explanations , 1991, Computer.

[125] Chris Mellish,et al. Experiments Using Stochastic Search for Text Planning , 1998, INLG.

[126] Kalina Bontcheva,et al. Dealing with Dependencies between Content Planning and Surface Realisation in a Pipeline Generation Architecture , 2001, IJCAI.

[127] Richard E. Mayer,et al. Cross-Cultural Evaluation of Politeness in Tactics for Pedagogical Agents , 2005, AIED.

[128] Chris Mellish,et al. A Review of Recent Corpus-based Methods for Evaluating Information Ordering in Text Production , 2005 .

[129] Nicolas Hernandez,et al. Recognizing Textual Parallelisms with Edit Distance and Similarity Degree , 2006, EACL.

[130] E. André. The Generation of Multimedia Presentations , 2000 .

[131] Ielka van der Sluis,et al. Building a Semantically Transparent Corpus for the Generation of Referring Expressions. , 2006, INLG.

[132] Harry Bunt,et al. From question answering to spoken dialogue: towards an information search assistant for interactive multimodal information extraction , 2005, INTERSPEECH.

[133] Ipke Wachsmuth,et al. Incremental Generation of Multimodal Deixis Referring to Objects , 2005, ENLG.

[134] Norbert Pfleger,et al. Context based multimodal fusion , 2004, ICMI '04.

[135] Jon Oberlander,et al. Data-Driven Generation of Emphatic Facial Displays , 2006, EACL.

[136] Yorick Wilks,et al. Multimodal Dialogue Management in the COMIC Project , 2003 .

[137] Judy Kay,et al. Scrutable Adaptation: Because We Can and Must , 2006, AH.

[138] P. Ekman,et al. Approach-withdrawal and cerebral asymmetry: emotional expression and brain physiology. I. , 1990, Journal of personality and social psychology.

[139] M. Pickering,et al. Toward a mechanistic psychology of dialogue , 2004, Behavioral and Brain Sciences.

[140] Philipp Koehn,et al. Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.

[141] Wolfgang Wahlster,et al. Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[142] Ehud Reiter,et al. Generating Readable Texts for Readers with Low Basic Skills , 2005, ENLG.

[143] Graham Wilcock,et al. Pipelines, Templates and Transformations: XML for Natural Language Generation , 2001 .

[144] Catherine Pelachaud,et al. From Greta's mind to her face: modelling the dynamics of affective states in a conversational embodied agent , 2003, Int. J. Hum. Comput. Stud..

[145] Steve Young,et al. The HTK book version 3.4 , 2006 .

[146] Ingrid Zukerman,et al. Natural Language Processing and User Modeling: Synergies and Limitations , 2001, User Modeling and User-Adapted Interaction.

[147] John Levine,et al. Automatic generation of technical documentation , 1994, Appl. Artif. Intell..

[148] Heinrich H. Bülthoff,et al. Manipulating Video Sequences to Determine the Components of Conversational Facial Expressions , 2005, TAP.

[149] Steven F. Roth,et al. Data characterization for intelligent graphics presentation , 1990, CHI '90.

[150] Ross Ihaka,et al. Gentleman R: R: A language for data analysis and graphics , 1996 .

[151] Ehud Reiter,et al. Should Corpora Texts Be Gold Standards for NLG? , 2002, INLG.

[152] Marilyn A. Walker,et al. Generation and evaluation of user tailored responses in multimodal dialogue , 2004 .

[153] J. Cassell,et al. SOCIAL DIALOGUE WITH EMBODIED CONVERSATIONAL AGENTS , 2005 .

[154] David J. Weiss,et al. SMARTS and SMARTER: Improved Simple Methods for Multiattribute Utility Measurement , 2008 .

[155] Stephen Wan,et al. Statistically Generated Summary Sentences : A Preliminary Evaluation using a Dependency Relation Precision Metric , 2005 .

[156] J Eriksson. Lessons from a failure : Generating tailored smoking cessation letters , 2003 .

[157] Jennifer Chu-Carroll,et al. Constructing and Utilizing a Model of User Preferences in Collaborative Consultation Dialogues , 1999, Comput. Intell..

[158] Eduard Hovy,et al. Generating Natural Language Under Pragmatic Constraints , 1988 .

[159] Jon Oberlander,et al. Dynamic hypertext catalogues: helping users to help themselves , 1998, HYPERTEXT '98.

[160] Thomas Rist,et al. The Design of Illustrated Documents as a Planning Task , 1993, AAAI Workshop on Intelligent Multimedia Interfaces.

[161] D. McNeill. Language and Gesture: Gesture in action , 2000 .

[162] Lars Borin,et al. Literature Review on Patient-Friendly Documentation Systems , 2006 .

[163] Hao Yan,et al. More than just a pretty face: affordances of embodiment , 2000, IUI '00.

[164] R. Passonneau. Computing Reliability for Co-Reference Annotation , 2004 .

[165] Johanna D. Moore,et al. Planning Text for Advisory Dialogues: Capturing Intentional and Rhetorical Information , 1993, CL.

[166] Chris Brew,et al. Stochastic text generation , 2000, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[167] Robert Dale,et al. Computational Interpretations of the Gricean Maxims in the Generation of Referring Expressions , 1995, Cogn. Sci..

[168] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[169] Mitsuru Ishizuka,et al. Using human physiology to evaluate subtle expressivity of a virtual quizmaster in a mathematical game , 2005, Int. J. Hum. Comput. Stud..

[170] Graeme Ritchie,et al. Computational Mechanisms for Pun Generation , 2005, ENLG.

[171] De Ruiter,et al. Some multimodal signals in humans , 2007 .

[172] Paul Ekman,et al. A Few Can Catch a Liar , 1999 .

[173] Marilyn A. Walker,et al. Improvising linguistic style: social and affective bases for agent personality , 1997, AGENTS '97.

[174] Matthew Stone,et al. Speaking with hands: creating animated conversational characters from recordings of human performance , 2004, ACM Trans. Graph..

[175] J. Bailenson,et al. Comparing Behavioral and Self-Report Measures of Embodied Agents' Social Presence in Immersive Virtual Environments , 2004 .

[176] Michael Kipp,et al. Gesture generation by imitation: from human behavior to computer character animation , 2005 .

[177] Richard Cox. Dynamic versus static hypermedia in museum education: an evaluation of ILEX, the intelligent labelli , 1999 .

[178] María Luisa Flecha-García. Eyebrow raising in dialogue: discourse structure, utterance function, and pitch accents , 2006 .