A Statistical, Grammar-Based Approach to Microplanning

Although there has been much work in recent years on data-driven natural language generation, little attention has been paid to the fine-grained interactions that arise during microplanning between aggregation, surface realization, and sentence segmentation. In this article, we propose a hybrid symbolic/statistical approach to jointly model the constraints regulating these interactions. Our approach integrates a small handwritten grammar, a statistical hypertagger, and a surface realization algorithm. It is applied to the verbalization of knowledge base queries and tested on 13 knowledge bases to demonstrate domain independence. We evaluate our approach in several ways. A quantitative analysis shows that the hybrid approach outperforms a purely symbolic approach in terms of both speed and coverage. Results from a human study indicate that users find the output of this hybrid statistic/symbolic system more fluent than both a template-based and a purely symbolic grammar-based approach. Finally, we illustrate by means of examples that our approach can account for various factors impacting aggregation, sentence segmentation, and surface realization.

[1]  Marie Wenzel Meteer,et al.  The “GENERATION GAP”: the problem of expressibility in text planning , 1990 .

[2]  Claire Gardent,et al.  Incremental Query Generation , 2014, EACL.

[3]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[4]  Günter Neumann,et al.  The QALL-ME Framework: A specifiable-domain multilingual Question Answering architecture , 2011, J. Web Semant..

[5]  Helen F. Hastie,et al.  Conditional Random Fields for Responsive Surface Realisation using Global Features , 2013, ACL.

[6]  Laurence Danlos,et al.  The Linguistic Basis of Text Generation , 1987, EACL.

[7]  Michael Strube,et al.  Beyond the Pipeline: Discrete Optimization in NLP , 2005, CoNLL.

[8]  Douglas Edmund Appelt,et al.  Planning natural language utterances to satisfy multiple goals , 1981 .

[9]  Marilyn A. Walker,et al.  SPoT: A Trainable Sentence Planner , 2001, NAACL.

[10]  Hwee Tou Ng,et al.  Natural Language Generation with Tree Conditional Random Fields , 2009, EMNLP.

[11]  Daniel Duma,et al.  Generating Natural Language from Linked Data: Unsupervised template extraction , 2013, IWCS.

[12]  Sergio Tessaris,et al.  Quelo: an Ontology-Driven Query Interface , 2011, Description Logics.

[13]  Jonas Kuhn,et al.  Combining Referring Expression Generation and Surface Realization: A Corpus-Based Investigation of Architectures , 2013, ACL.

[14]  Joseph Le Roux,et al.  XMG: eXtensible MetaGrammar , 2013, Computational Linguistics.

[15]  Ion Androutsopoulos,et al.  Using Integer Linear Programming in Concept-to-Text Generation to Produce More Compact Texts , 2013, ACL.

[16]  Michael White,et al.  Hypertagging: Supertagging for Surface Realization with CCG , 2008, ACL.

[17]  Igor Mel’čuk,et al.  Dependency Syntax: Theory and Practice , 1987 .

[18]  Mirella Lapata,et al.  Unsupervised Concept-to-text Generation with Hypergraphs , 2012, NAACL.

[19]  Johan Bos,et al.  Wide-Coverage Semantic Analysis with Boxer , 2008, STEP.

[20]  Tiziana Catarci,et al.  An Ontology Based Visual Tool for Query Formulation Support , 2004, OTM Workshops.

[21]  Dan Klein,et al.  A Simple Domain-Independent Probabilistic Approach to Generation , 2010, EMNLP.

[22]  Raymond J. Mooney,et al.  Generation by Inverting a Semantic Parser that Uses Statistical Machine Translation , 2007, NAACL.

[23]  James R. Curran,et al.  Multi-Tagging for Lexicalized-Grammar Parsing , 2006, ACL.

[24]  Aravind K. Joshi,et al.  Feature Structures Based Tree Adjoining Grammars , 1988, COLING.

[25]  Ehud Reiter,et al.  Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[26]  Andrea Giovanni Nuzzolese,et al.  FRED: From Natural Language Text to RDF and OWL in One Click , 2013, ESWC.

[27]  Claire Gardent,et al.  A Symbolic Approach to Near-Deterministic Surface Realisation using Tree Adjoining Grammar , 2007, ACL.

[28]  C. Maria Keet,et al.  Enhancing Web Portals with Ontology-Based Data Access: The Case Study of South Africa's Accessibility Portal for People with Disabilities , 2008, OWLED.

[29]  Mirella Lapata,et al.  Concept-to-text Generation via Discriminative Reranking , 2012, ACL.

[30]  Marilyn A. Walker,et al.  Individual and Domain Adaptation in Sentence Planning for Dialogue , 2007, J. Artif. Intell. Res..

[31]  Enrico Franconi,et al.  Quelo : a NL-based intelligent query interface , 2010 .

[32]  Blake Howald,et al.  A Statistical NLG Framework for Aggregated Planning and Realization , 2013, ACL.

[33]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[34]  James R. Curran,et al.  The Importance of Supertagging for Wide-Coverage CCG Parsing , 2004, COLING.

[35]  Enrico Franconi,et al.  An intelligent query interface based on ontology navigation , 2010 .

[36]  Philipp Koehn,et al.  Abstract Meaning Representation for Sembanking , 2013, LAW@ACL.

[37]  Claire Gardent,et al.  RTG based surface realisation for TAG , 2010, COLING.

[38]  Marco Trevisan,et al.  A natural language ontology-driven query interface , 2011 .

[39]  Srinivas Bangalore,et al.  Supertagging: An Approach to Almost Parsing , 1999, CL.