A Global Model for Concept-to-Text Generation

Concept-to-text generation refers to the task of automatically producing textual output from non-linguistic input. We present a joint model that captures content selection ("what to say") and surface realization ("how to say") in an unsupervised domain-independent fashion. Rather than breaking up the generation process into a sequence of local decisions, we define a probabilistic context-free grammar that globally describes the inherent structure of the input (a corpus of database records and text describing some of them). We recast generation as the task of finding the best derivation tree for a set of database records and describe an algorithm for decoding in this framework that allows to intersect the grammar with additional information capturing fluency and syntactic well-formedness constraints. Experimental evaluation on several domains achieves results competitive with state-of-the-art systems that use domain specific constraints, explicit feature engineering or labeled data.

[1]  Mirella Lapata,et al.  Concept-to-text Generation via Discriminative Reranking , 2012, ACL.

[2]  Ehud Reiter,et al.  Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[3]  Nancy Green,et al.  Generation of Biomedical Arguments for Lay Readers , 2006, INLG.

[4]  Ehud Reiter,et al.  Generating Approximate Geographic Descriptions , 2009, ENLG.

[5]  Raymond J. Mooney,et al.  Generative Alignment and Semantic Parsing for Learning from Ambiguous Supervision , 2010, COLING.

[6]  Giorgio Gallo,et al.  Directed Hypergraphs and Applications , 1993, Discret. Appl. Math..

[7]  Giorgio Satta,et al.  The language intersection problem for non-recursive context-free grammars , 2004, Inf. Comput..

[8]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[9]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[10]  William J. Byrne,et al.  Hierarchical Phrase-based Translation Representations , 2011, EMNLP.

[11]  Dan Klein,et al.  Parsing and Hypergraphs , 2001, IWPT.

[12]  Ben Taskar,et al.  An End-to-End Discriminative Approach to Machine Translation , 2006, ACL.

[13]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[14]  Adwait Ratnaparkhi,et al.  Trainable approaches to surface natural language generation and their application to conversational dialog systems , 2002, Comput. Speech Lang..

[15]  Jim Hunter,et al.  Choosing words in computer-generated weather forecasts , 2005, Artif. Intell..

[16]  Alexander I. Rudnicky,et al.  Expanding the Scope of the ATIS Task: The ATIS-3 Corpus , 1994, HLT.

[17]  Anja Belz,et al.  Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models , 2008, Natural Language Engineering.

[18]  Mirella Lapata,et al.  Collective Content Selection for Concept-to-Text Generation , 2005, HLT.

[19]  Joshua Goodman,et al.  Semiring Parsing , 1999, CL.

[20]  Luke S. Zettlemoyer,et al.  Online Learning of Relaxed CCG Grammars for Parsing to Logical Form , 2007, EMNLP.

[21]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[22]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[23]  Raymond J. Mooney,et al.  Learning to sportscast: a test of grounded language acquisition , 2008, ICML '08.

[24]  Raymond J. Mooney,et al.  Generation by Inverting a Semantic Parser that Uses Statistical Machine Translation , 2007, NAACL.

[25]  Richard I. Kittredge,et al.  Using natural-language processing to produce weather forecasts , 1994, IEEE Expert.

[26]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[27]  Hwee Tou Ng,et al.  A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions , 2011, EMNLP.

[28]  Dan Klein,et al.  A Simple Domain-Independent Probabilistic Approach to Generation , 2010, EMNLP.

[29]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[30]  Michele Banko,et al.  Headline Generation Based on Statistical Translation , 2000, ACL.

[31]  Jim Hunter,et al.  Generating English summaries of time series data using the Gricean maxims , 2003, KDD '03.

[32]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[33]  Brian S. Yandell,et al.  Practical Data Analysis for Designed Experiments , 1998 .

[34]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[35]  Dan Klein,et al.  Learning Semantic Correspondences with Less Supervision , 2009, ACL.

[36]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[37]  Tadao Kasami,et al.  An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages , 1965 .

[38]  David Chiang,et al.  Better k-best Parsing , 2005, IWPT.

[39]  Liang Huang,et al.  Forest Reranking: Discriminative Parsing with Non-Local Features , 2008, ACL.

[40]  Sabine Geldof,et al.  CORAL: using natural language generation for navigational assistance , 2003 .

[41]  Kathleen McKeown,et al.  Content Planner Construction via Evolutionary Algorithms and a Corpus-based Fitness Function , 2002, INLG.

[42]  David Chiang,et al.  Forest Rescoring: Faster Decoding with Integrated Language Models , 2007, ACL.

[43]  William J. Byrne,et al.  Hierarchical Phrase-Based Translation with Weighted Finite-State Transducers and Shallow-n Grammars , 2010, CL.

[44]  Zhifei Li,et al.  First- and Second-Order Expectation Semirings with Applications to Minimum-Risk Training on Translation Forests , 2009, EMNLP.

[45]  Stuart M. Shieber,et al.  Principles and Implementation of Deductive Parsing , 1994, J. Log. Program..

[46]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.