Statistical Acquisition of Content Selection Rules for Natural Language Generation

A Natural Language Generation system produces text using as input semantic data. One of its very first tasks is to decide which pieces of information to convey in the output. This task, called Content Selection, is quite domain dependent, requiring considerable re-engineering to transport the system from one scenario to another. In this paper, we present a method to acquire content selection rules automatically from a corpus of text and associated semantics. Our proposed technique was evaluated by comparing its output with information selected by human authors in unseen texts, where we were able to filter half the input data set without loss of recall.

[1]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[2]  Kathleen McKeown,et al.  Text generation: using discourse strategies and focus constraints to generate natural language text , 1985 .

[3]  Johanna D. Moore,et al.  Planning Text for Advisory Dialogues: Capturing Intentional and Rhetorical Information , 1993, CL.

[4]  Bruce W. Porter,et al.  Extracting Viewpoints from Knowledge Bases , 1994, AAAI.

[5]  Elke Teich,et al.  Towards the Application of Text Generation in an Integrated Publication System , 1994, INLG.

[6]  William W. Cohen Learning Trees and Rules with Set-Valued Features , 1996, AAAI/IAAI, Vol. 1.

[7]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[8]  Dragomir R. Radev,et al.  Building a Generation Knowledge Source using Internet-Accessible Newswire , 1997, ANLP.

[9]  James C. Lester,et al.  Developing and Empirically Evaluating Robust Explanation Generators: The KNIGHT Experiments , 1997, Comput. Linguistics.

[10]  Daniel Marcu,et al.  From Local to Global Coherence: A Bottom-Up Approach to Text Planning , 1997, AAAI/IAAI.

[11]  Richard Cox Dynamic versus static hypermedia in museum education: an evaluation of ILEX, the intelligent labelli , 1999 .

[12]  Ehud Reiter,et al.  Knowledge Acquisition for Natural Language Generation , 2000, INLG.

[13]  G. Illouz Typage de donnees textuelles et adaptation des traitements linguistiques application a l'annotation morpho-syntaxique , 2000 .

[14]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[15]  Inderjeet Mani,et al.  Producing Biographical Summaries: Combining Linguistic Knowledge with Corpus Statistics , 2001, ACL.

[16]  Ben Taskar,et al.  Probabilistic Classification and Clustering in Relational Data , 2001, IJCAI.

[17]  Kalina Bontcheva,et al.  Dealing with Dependencies between Content Planning and Surface Realisation in a Pipeline Generation Architecture , 2001, IJCAI.

[18]  Jim Hunter,et al.  A Two-Staged Model For Content Determination , 2001, EWNLG@ACL.

[19]  Regina Barzilay,et al.  Bootstrapping Lexical Choice via Multiple-Sequence Alignment , 2002, EMNLP.

[20]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[21]  Nikiforos Karamanis,et al.  Stochastic Text Structuring Using the Principle of Continuity , 2002, INLG.

[22]  Dan Brickley,et al.  Resource Description Framework (RDF) Model and Syntax Specification , 2002 .

[23]  David E. Millard,et al.  Artequakt: Generating Tailored Biographies with Automatically Annotated Fragments from the Web , 2002, SAAKM@ECAI.

[24]  Vasileios Hatzivassiloglou,et al.  PROGENIE: Biographical Descriptions for Intelligence Analysis , 2003, ISI.

[25]  P. Ivax,et al.  A THEORY FOR RECORD LINKAGE , 2004 .

[26]  Craig A. Knoblock,et al.  Hierarchical Wrapper Induction for Semistructured Information Sources , 2004, Autonomous Agents and Multi-Agent Systems.

[27]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.