Automatic Generation of Multi-Modal Dialogue from Text Based on Discourse Structure Analysis

In this paper, we propose a novel method for generating engaging multi-modal content automatically from text. Rhetorical structure theory (RST) is used to decompose text into discourse units and to identify rhetorical discourse relations between them. Rhetorical relations are then mapped to question-answer pairs in an information preserving way, i.e., the original text and the resulting dialogue convey essentially the same meaning. Finally, the dialogue is "acted out" by two virtual agents. The network of dialogue structures automatically built up during this process, called DialogueNet, can be reused for other purposes, such as personalization or question-answering.

[1]  Kees van Deemter,et al.  Generating Multimedia Presentations from Plain Text to Screen Play , 2005 .

[2]  Cynthia LeRouge,et al.  Developing multimodal intelligent affective interfaces for tele-home health care , 2003, Int. J. Hum. Comput. Stud..

[3]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[4]  Joyce Chai,et al.  Discourse Structure for Context Question Answering , 2004, HLT-NAACL 2004.

[5]  Sadao Kurohashi,et al.  Automatic Slide Generation Based on Discourse Structure Analysis , 2006 .

[6]  Min-Yen Kan,et al.  Customization in a unified framework for summarizing medical literature , 2005, Artif. Intell. Medicine.

[7]  Kôiti Hasida,et al.  Semantic Authoring and Semantic Computing , 2003, JSAI Workshops.

[8]  Katsumi Tanaka,et al.  Complementing your TV-viewing by web content automatically-transformed into TV-program-type content , 2005, MULTIMEDIA '05.

[9]  Mitsuru Ishizuka,et al.  MPML3D: A Reactive Framework for the Multimodal Presentation Markup Language , 2006, IVA.

[10]  Richard Cox,et al.  Vicarious learning from dialogue and discourse -- A controlled comparison , 1999 .

[11]  Steffen Staab,et al.  Creating relational metadata with a component-based , 2001 .

[12]  R. Power,et al.  Generating scripts for personalised medical dialogues for patients , 2006 .

[13]  David Reitter,et al.  Simple Signals for Complex Rhetorics: On Rhetorical Analysis with Rich-Feature Support Vector Models , 2003, LDV Forum.

[14]  Richard Power,et al.  Generating monologue and dialogue to present personalised medical information to patients , 2007, ENLG.

[15]  Kaoru Sumi,et al.  Transforming E-contents into a Storybook World with Animations and dialogues using Semantic Tags , 2005 .

[16]  Mitsuru Ishizuka,et al.  Life-like characters - tools, affective functions, and applications , 2004, Life-like characters.

[17]  Mitsuru Ishizuka,et al.  Describing and generating multimodal contents featuring affective lifelike agents with MPML , 2009, New Generation Computing.

[18]  Daniel Marcu,et al.  Sentence Level Discourse Parsing using Syntactic and Lexical Information , 2003, NAACL.

[19]  Jeanette K. Gundel Stress, pronominalization and the given-new distinction , 1978 .

[20]  E. André The Generation of Multimedia Presentations , 2000 .

[21]  Kees van Deemter,et al.  Towards Automated Generation of Scripted Dialogue: Some Time-Honoured Strategies , 2003, ArXiv.

[22]  A. Church The calculi of lambda-conversion , 1941 .

[23]  Daniel Marcu,et al.  An Unsupervised Approach to Recognizing Discourse Relations , 2002, ACL.

[24]  Geetha Abeysinghe,et al.  A Study to Improve the Efficiency of a Discourse Parsing System , 2003, CICLing.

[25]  Thomas Rist,et al.  The automated design of believable dialogues for animated presentation teams , 2001 .

[26]  Maki Watanabe,et al.  Discourse Tagging Reference Manual , 2001 .