Inducing Document Plans for Concept-to-Text Generation

In a language generation system, a content planner selects which elements must be included in the output text and the ordering between them. Recent empirical approaches perform content selection without any ordering and have thus no means to ensure that the output is coherent. In this paper we focus on the problem of generating text from a database and present a trainable end-to-end generation system that includes both content selection and ordering. Content plans are represented intuitively by a set of grammar rules that operate on the document level and are acquired automatically from training data. We develop two approaches: the first one is inspired from Rhetorical Structure Theory and represents the document as a tree of discourse relations between database records; the second one requires little linguistic sophistication and uses tree structures to represent global patterns of database record sequences within a document. Experimental evaluation on two domains yields considerable improvements over the state of the art for both approaches.

[1]  Marilyn A. Walker,et al.  Trainable Sentence Planning for Complex Information Presentations in Spoken Dialog Systems , 2004, ACL.

[2]  Raymond J. Mooney,et al.  Generation by Inverting a Semantic Parser that Uses Statistical Machine Translation , 2007, NAACL.

[3]  Maki Watanabe,et al.  Discourse Tagging Reference Manual , 2001 .

[4]  Raymond J. Mooney,et al.  Generative Alignment and Semantic Parsing for Learning from Ambiguous Supervision , 2010, COLING.

[5]  John A. Bateman,et al.  Rhetorical structure theory , 2006 .

[6]  Richard Power,et al.  Deriving Rhetorical Complexity Data from the RST-DT Corpus , 2008, LREC.

[7]  Richard Power,et al.  Optimizing Referential Coherence in Text Generation , 2004, CL.

[8]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[9]  Phil Blunsom,et al.  Inducing Tree-Substitution Grammars , 2010, J. Mach. Learn. Res..

[10]  Dan Klein,et al.  A Simple Domain-Independent Probabilistic Approach to Generation , 2010, EMNLP.

[11]  Blake Howald,et al.  GenNext: A Consolidated Domain Adaptable NLG System , 2013, ENLG.

[12]  Tadao Kasami,et al.  An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages , 1965 .

[13]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[14]  Blake Howald,et al.  Domain Adaptable Semantic Clustering in Statistical NLG , 2013, IWCS.

[15]  Kathleen McKeown,et al.  Content Planner Construction via Evolutionary Algorithms and a Corpus-based Fitness Function , 2002, INLG.

[16]  Nikiforos Karamanis,et al.  Entity coherence for descriptive text structuring , 2004 .

[17]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[18]  Ehud Reiter,et al.  Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[19]  Mirella Lapata,et al.  Unsupervised Concept-to-text Generation with Hypergraphs , 2012, NAACL.

[20]  Robert Dale,et al.  Generating referring expressions in a domain of objects and processes (language representation) , 1988 .

[21]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[22]  Kathleen McKeown,et al.  Empirically Estimating Order Constraints for Content Planning in Generation , 2001, ACL.

[23]  Jim Hunter,et al.  Choosing words in computer-generated weather forecasts , 2005, Artif. Intell..

[24]  Daniel Marcu,et al.  Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001, SIGDIAL Workshop.

[25]  Chris Mellish,et al.  Experiments Using Stochastic Search for Text Planning , 1998, INLG.

[26]  William C. Mann,et al.  Rhetorical Structure Theory: A Framework for the Analysis of Texts , 1987 .

[27]  Raymond J. Mooney,et al.  Learning to sportscast: a test of grounded language acquisition , 2008, ICML '08.

[28]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[29]  Graeme Hirst,et al.  Text-level Discourse Parsing with Rich Linguistic Features , 2012, ACL.

[30]  Clarisse Sieckenius de Souza,et al.  Getting the message across in RST-based text generation , 1990 .

[31]  Mark Johnson,et al.  PCFG Models of Linguistic Tree Representations , 1998, CL.

[32]  Luke S. Zettlemoyer,et al.  Reinforcement Learning for Mapping Instructions to Actions , 2009, ACL.

[33]  Eduard H. Hovy,et al.  Automated Discourse Generation Using Discourse Structure Relations , 1993, Artif. Intell..

[34]  Dan Klein,et al.  Learning Semantic Correspondences with Less Supervision , 2009, ACL.

[35]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[36]  Anja Belz,et al.  Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models , 2008, Natural Language Engineering.

[37]  Mirella Lapata,et al.  Collective Content Selection for Concept-to-Text Generation , 2005, HLT.