Empirically Estimating Order Constraints for Content Planning in Generation

In a language generation system, a content planner embodies one or more "plans" that are usually hand--crafted, sometimes through manual analysis of target text. In this paper, we present a system that we developed to automatically learn elements of a plan and the ordering constraints among them. As training data, we use semantically annotated transcripts of domain experts performing the task our system is designed to mimic. Given the large degree of variation in the spoken language of the transcripts, we developed a novel algorithm to find parallels between transcripts based on techniques used in computational genomics. Our proposed methodology was evaluated two--fold: the learning and generalization capabilities were quantitatively evaluated using cross validation obtaining a level of accuracy of 89%. A qualitative evaluation is also provided.

[1]  David Fisher,et al.  Description of the UMass system as used for MUC-6 , 1995, MUC.

[2]  Steven K. Feiner,et al.  A study of communication in the Cardiac Surgery Intensive Care Unit and its implications for automated briefing , 2000, AMIA.

[3]  Johanna D. Moore,et al.  Planning Text for Advisory Dialogues: Capturing Intentional and Rhetorical Information , 1993, CL.

[4]  Robert Dale,et al.  Generating referring expressions in a domain of objects and processes (language representation) , 1988 .

[5]  Regina Barzilay,et al.  Sentence Ordering in Multidocument Summarization , 2001, HLT.

[6]  Aris Floratos,et al.  Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm [published erratum appears in Bioinformatics 1998;14(2): 229] , 1998, Bioinform..

[7]  Kevin Knight,et al.  The Practical Value of N-Grams Is in Generation , 1998, INLG.

[8]  Kathleen McKeown,et al.  Empirically Designing and Evaluating a New Revision-Based Model for Summary Generation , 1996, Artif. Intell..

[9]  Donia Scott,et al.  Can text structure be incompatible with rhetorical structure? , 2000, INLG.

[10]  Ellen Riloff,et al.  Automatically Constructing a Dictionary for Information Extraction Tasks , 1993, AAAI.

[11]  I. Dan Melamed A Portable Algorithm for Mapping Bitext Correspondence , 1997, ACL.

[12]  Kathleen McKeown,et al.  Text generation: using discourse strategies and focus constraints to generate natural language text , 1985 .

[13]  Srinivas Bangalore,et al.  Exploiting a Probabilistic Hierarchical Model for Generation , 2000, COLING.

[14]  T. Speed,et al.  Biological Sequence Analysis , 1998 .

[15]  Shimei Pan,et al.  Language Generation for Multimedia Healthcare Briefings , 1997, ANLP.

[16]  Eduard H. Hovy,et al.  Automated Discourse Generation Using Discourse Structure Relations , 1993, Artif. Intell..

[17]  Vasileios Hatzivassiloglou,et al.  Two-Level, Many-Paths Generation , 1995, ACL.

[18]  Ehud Reiter,et al.  Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[19]  Andrea Califano,et al.  SPLASH: structural pattern localization analysis by sequential histograms , 2000, Bioinform..

[20]  Steven K. Feiner,et al.  Negotiation for automated generation of temporal multimedia presentations , 1997, MULTIMEDIA '96.

[21]  M. A. McClure,et al.  A Comparative Analysis of Computational Motif-Detection Methods , 1998, Pacific Symposium on Biocomputing.

[22]  R. Browne,et al.  A comparative. , 1950, The British journal of ophthalmology.