Learning to Reason for Text Generation from Scientific Tables

In this paper, we introduce SciGen, a new challenge dataset for the task of reasoningaware data-to-text generation consisting of tables from scientific articles and their corresponding descriptions. Describing scientific tables goes beyond the surface realization of the table content and requires reasoning over table values. The unique properties of SciGen are that (1) tables mostly contain numerical values, and (2) the corresponding descriptions require arithmetic reasoning. SciGen is therefore the first dataset that assesses the arithmetic reasoning capabilities of generation models on complex input structures, i.e., tables from scientific articles. We study the effectiveness of state-of-the-art data-to-text generation models on SciGen and evaluate the results using common metrics as well as human evaluation. Our results and analyses show that (a) while humans like to reason for describing scientific tables, the ability of stateof-the-art models is severely limited on this task, (b) while adding more training data improves the results, it is not the solution for reasoning-aware text generation, and (c) one of the main bottlenecks for this task is the lack of proper automatic evaluation metrics. The data, code, and annotations for human evaluation will be available at https://github. com/UKPLab/SciGen. SciGen opens new avenues for future research in reasoning-aware text generation and evaluation.

[1]  Verena Rieser,et al.  Findings of the E2E NLG Challenge , 2018, INLG.

[2]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[3]  Mitesh M. Khapra,et al.  Generating Descriptions from Structured Data Using a Bifocal Attention Mechanism and Gated Orthogonalization , 2018, NAACL.

[4]  Iryna Gurevych,et al.  Improving Generalization by Incorporating Coverage in Natural Language Inference , 2019, ArXiv.

[5]  Thomas Wolf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[6]  Wenhu Chen,et al.  TabFact: A Large-scale Dataset for Table-based Fact Verification , 2019, ICLR.

[7]  Marilyn A. Walker,et al.  Can Neural Generators for Dialogue Learn Sentence Planning and Discourse Structuring? , 2018, INLG.

[8]  Emiel Krahmer,et al.  Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation , 2017, J. Artif. Intell. Res..

[9]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[10]  Raymond J. Mooney,et al.  Learning to sportscast: a test of grounded language acquisition , 2008, ICML '08.

[11]  Kilian Q. Weinberger,et al.  BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[12]  Emiel Krahmer,et al.  PASS: A Dutch data-to-text system for soccer, targeted towards specific audiences , 2017, INLG.

[13]  Anja Belz,et al.  The First Surface Realisation Shared Task: Overview and Evaluation Results , 2011, ENLG.

[14]  Claire Gardent,et al.  The WebNLG Challenge: Generating Text from RDF Data , 2017, INLG.

[15]  François Portet,et al.  Generation of Company descriptions using concept-to-text and text-to-text deep models: dataset collection and systems evaluation , 2018, INLG.

[16]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[17]  Alon Lavie,et al.  Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.

[18]  David Grangier,et al.  Neural Text Generation from Structured Data with Application to the Biography Domain , 2016, EMNLP.

[19]  Diyi Yang,et al.  ToTTo: A Controlled Table-To-Text Generation Dataset , 2020, EMNLP.

[20]  Iryna Gurevych,et al.  Investigating Pretrained Language Models for Graph-to-Text Generation , 2020, ArXiv.

[21]  Mirella Lapata,et al.  Text Generation from Knowledge Graphs with Graph Transformers , 2019, NAACL.

[22]  Rico Sennrich,et al.  When a Good Translation is Wrong in Context: Context-Aware Machine Translation Improves on Deixis, Ellipsis, and Lexical Cohesion , 2019, ACL.

[23]  Kartikeya Upasani,et al.  Constrained Decoding for Neural NLG from Compositional Representations in Task-Oriented Dialogue , 2019, ACL.

[24]  Thibault Sellam,et al.  BLEURT: Learning Robust Metrics for Text Generation , 2020, ACL.

[25]  Heng Ji,et al.  Describing a Knowledge Base , 2018, INLG.

[26]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[27]  Verena Rieser,et al.  Evaluating the State-of-the-Art of End-to-End Natural Language Generation: The E2E NLG Challenge , 2019, Comput. Speech Lang..

[28]  Marilyn A. Walker,et al.  Controlling Personality-Based Stylistic Variation with Neural Natural Language Generators , 2018, SIGDIAL Conference.

[29]  Tao Yu,et al.  DART: Open-Domain Structured Data Record to Text Generation , 2020, NAACL.

[30]  Fei Liu,et al.  MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance , 2019, EMNLP.

[31]  Sebastian Ruder,et al.  AxCell: Automatic Extraction of Results from Machine Learning Papers , 2020, EMNLP.

[32]  Alexander M. Rush,et al.  Challenges in Data-to-Document Generation , 2017, EMNLP.

[33]  Wenhu Chen,et al.  Logical Natural Language Generation from Open-Domain Tables , 2020, ACL.

[34]  Ehud Reiter,et al.  Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[35]  Ehud Reiter,et al.  SportSett:Basketball - A robust and maintainable data-set for Natural Language Generation , 2020, INTELLANG.

[36]  Anja Belz,et al.  Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models , 2008, Natural Language Engineering.