Enhancing and Evaluating the Grammatical Framework Approach to Logic-to-Text Generation

Logic-to-text generation is an important yet underrepresented area of natural language generation (NLG). In particular, most previous works on this topic lack sound evaluation. We address this limitation by building and evaluating a system that generates high-quality English text given a first-order logic (FOL) formula as input. We start by analyzing the performance of Ranta (2011)’s system. Based on this analysis, we develop an extended version of the system, which we name LoLa, that performs formula simplification based on logical equivalences and syntactic transformations. We carry out an extensive evaluation of LoLa using standard automatic metrics and human evaluation. We compare the results against a baseline and Ranta (2011)’s system. The results show that LoLa outperforms the other two systems in most aspects.

[1]  Markus N. Rabe,et al.  Autoformalization with Large Language Models , 2022, NeurIPS.

[2]  Sebastian Gehrmann,et al.  Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text , 2022, J. Artif. Intell. Res..

[3]  Mirella Lapata,et al.  Text Generation from Discourse Representation Structures , 2021, NAACL.

[4]  Alan F. Smeaton,et al.  Translation Quality Assessment: A Brief Survey on Manual and Automatic Methods , 2021, MOTRA.

[5]  Dimitra Gkatzia,et al.  Twenty Years of Confusion in Human Evaluation: NLG Needs Evaluation Sheets and Standardised Definitions , 2020, INLG.

[6]  Adam Poliak,et al.  A survey on Recognizing Textual Entailment as an NLP Evaluation , 2020, EVAL4NLP.

[7]  Teresa Kouri Kissel,et al.  Classical logic , 2020, Classical and Nonclassical Logics.

[8]  Matthias Scheutz,et al.  Generating justifications for norm-related agent decisions , 2019, INLG.

[9]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[10]  Noah A. Smith,et al.  Sentence Mover’s Similarity: Automatic Evaluation for Multi-Sentence Texts , 2019, ACL.

[11]  Kilian Q. Weinberger,et al.  BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[12]  J. Chai,et al.  Recent Advances in Natural Language Inference: A Survey of Benchmarks, Resources, and Approaches , 2019, 1904.01172.

[13]  Pascual Martínez-Gómez,et al.  Neural sentence generation from formal semantics , 2018, INLG.

[14]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[15]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[16]  Claire Gardent,et al.  The WebNLG Challenge: Generating Text from RDF Data , 2017, INLG.

[17]  Verena Rieser,et al.  Why We Need New Evaluation Metrics for NLG , 2017, EMNLP.

[18]  Yejin Choi,et al.  Neural AMR: Sequence-to-Sequence Models for Parsing and Generation , 2017, ACL.

[19]  Emiel Krahmer,et al.  Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation , 2017, J. Artif. Intell. Res..

[20]  Ke Hu,et al.  A Comparative Study of Post-editing Guidelines , 2016, EAMT.

[21]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[22]  Matt Post,et al.  Efficient Elicitation of Annotations for Human Evaluation of Machine Translation , 2014, WMT@ACL.

[23]  Ion Androutsopoulos,et al.  Generating Natural Language Descriptions from OWL Ontologies: the NaturalOWL System , 2013, J. Artif. Intell. Res..

[24]  Jens Lehmann,et al.  Sorry, i don't speak SPARQL: translating SPARQL queries into natural language , 2013, WWW.

[25]  Elena Paslaru Bontas Simperl,et al.  SPARTIQULATION: Verbalizing SPARQL Queries , 2012, ILD@ESWC.

[26]  Kees van Deemter,et al.  Managing Ambiguity in Reference Generation: The Role of Surface Structure , 2012, Top. Cogn. Sci..

[27]  Aarne Ranta,et al.  Translating between Language and Logic: What Is Easy and What Is Difficult , 2011, CADE.

[28]  Hwee Tou Ng,et al.  A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions , 2011, EMNLP.

[29]  Elizabeth Coppock,et al.  A Translation from Logic to English with Dynamic Semantics , 2009, JSAI-isAI Workshops.

[30]  Ioannis Hatzilygeroudis,et al.  A Knowledge-based System for Translating FOL Formulas into NL Sentences , 2009, AIAI.

[31]  Raymond J. Mooney,et al.  Learning to sportscast: a test of grounded language acquisition , 2008, ICML '08.

[32]  Raymond J. Mooney,et al.  Generation by Inverting a Semantic Parser that Uses Statistical Machine Translation , 2007, NAACL.

[33]  Tom Minka,et al.  TrueSkillTM: A Bayesian Skill Rating System , 2006, NIPS.

[34]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[35]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[36]  Raymond J. Mooney,et al.  Learning to Parse Database Queries Using Inductive Logic Programming , 1996, AAAI/IAAI, Vol. 2.

[37]  Ehud Reiter,et al.  Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[38]  John D. Phillips,et al.  Generation of text from logical formulae , 1993, Machine Translation.

[39]  M ShieberStuart,et al.  The problem of logical-form equivalence , 1993 .

[40]  Juen-tin Wang,et al.  On Computational Sentence Generation From Logical Form , 1980, COLING.

[41]  Johan Bos,et al.  Evaluating Text Generation from Discourse Representation Structures , 2021, GEM.

[42]  Kees van Deemter,et al.  Towards Generating Effective Explanations of Logical Formulas: Challenges and Strategies , 2020, NL4XAI.

[43]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[44]  Stephen Doherty,et al.  Approaches to Human and Machine Translation Quality Assessment , 2018 .

[45]  Ioannis Hatzilygeroudis,et al.  Converting First Order Logic into Natural Language : A First Level Approach , 2007 .

[46]  A. Elo The rating of chessplayers, past and present , 1978 .

[47]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.