Data-to-Text Generation with Iterative Text Editing

We present a novel approach to data-to-text generation based on iterative text editing. Our approach maximizes the completeness and semantic accuracy of the output text while leveraging the abilities of recent pre-trained models for text editing (LaserTagger) and language modeling (GPT-2) to improve the text fluency. To this end, we first transform data items to text using trivial templates, and then we iteratively improve the resulting text by a neural model trained for the sentence fusion task. The output of the model is filtered by a simple heuristic and reranked with an off-the-shelf pre-trained language model. We evaluate our approach on two major data-to-text datasets (WebNLG, Cleaned E2E) and analyze its caveats and benefits. Furthermore, we show that our formulation of data-to-text generation opens up the possibility for zero-shot domain adaptation using a general-domain dataset for sentence fusion.

[1]  Verena Rieser,et al.  The E2E Dataset: New Challenges For End-to-End Generation , 2017, SIGDIAL Conference.

[2]  Mihir Kale,et al.  Few-Shot Natural Language Generation by Rewriting Templates , 2020, ArXiv.

[3]  C. Lawrence Zitnick,et al.  CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[5]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[6]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[7]  Mihir Kale,et al.  Template Guided Text Generation for Task Oriented Dialogue , 2020, EMNLP.

[8]  Stefan Ultes,et al.  MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling , 2018, EMNLP.

[9]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[10]  Changhan Wang,et al.  Levenshtein Transformer , 2019, NeurIPS.

[11]  Shashi Narayan,et al.  Sticking to the Facts: Confident Decoding for Faithful Data-to-Text Generation , 2019, ArXiv.

[12]  Franck Dernoncourt,et al.  Scoring Sentence Singletons and Pairs for Abstractive Summarization , 2019, ACL.

[13]  Ido Dagan,et al.  Step-by-Step: Separating Planning from Realization in Neural Data-to-Text Generation , 2019, NAACL.

[14]  Ehud Reiter,et al.  Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[15]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[16]  Claire Gardent,et al.  The WebNLG Challenge: Generating Text from RDF Data , 2017, INLG.

[17]  Emiel Krahmer,et al.  Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation , 2017, J. Artif. Intell. Res..

[18]  Kathleen McKeown,et al.  A Good Sample is Hard to Find: Noise Injection Sampling and Self-Training for Neural Language Generation Models , 2019, INLG.

[19]  Guillermo Garrido,et al.  FELIX: Flexible Text Editing Through Tagging and Insertion , 2020, FINDINGS.

[20]  Ondrej Dusek,et al.  Evaluating Semantic Accuracy of Data-to-Text Generation with Natural Language Inference , 2020, INLG.

[21]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[22]  Mihir Kale Text-to-Text Pre-Training for Data-to-Text Tasks , 2020, INLG.

[23]  Verena Rieser,et al.  Semantic Noise Matters for Neural Natural Language Generation , 2019, INLG.

[24]  Aliaksei Severyn,et al.  Encode, Tag, Realize: High-Precision Text Editing , 2019, EMNLP.

[25]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[26]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[27]  Emiel Krahmer,et al.  Neural data-to-text generation: A comparison between pipeline and end-to-end architectures , 2019, EMNLP.

[28]  Emiel Krahmer,et al.  Enriching the WebNLG corpus , 2018, INLG.

[29]  Idan Szpektor,et al.  DiscoFuse: A Large-Scale Dataset for Discourse-Based Sentence Fusion , 2019, NAACL.

[30]  Amir Saffari,et al.  Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity , 2020, COLING.

[31]  Chin-Yew Lin,et al.  A Simple Recipe towards Reducing Hallucination in Neural Surface Realisation , 2019, ACL.

[32]  Ondrej Dusek,et al.  Sequence-to-Sequence Generation for Spoken Dialogue via Deep Syntax Trees and Strings , 2016, ACL.

[33]  Zhiyu Chen,et al.  Few-shot NLG with Pre-trained Language Model , 2020, ACL.

[34]  Alexander M. Rush,et al.  Challenges in Data-to-Document Generation , 2017, EMNLP.

[35]  Marilyn A. Walker,et al.  A Deep Ensemble Model with Slot Alignment for Sequence-to-Sequence Natural Language Generation , 2018, NAACL.

[36]  Ido Dagan,et al.  Improving Quality and Efficiency in Plan-based Neural Data-to-Text Generation , 2019, INLG.

[37]  Verena Rieser,et al.  Evaluating the State-of-the-Art of End-to-End Natural Language Generation: The E2E NLG Challenge , 2019, Comput. Speech Lang..

[38]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[39]  Regina Barzilay,et al.  Sentence Fusion for Multidocument News Summarization , 2005, CL.

[40]  Jackie Chi Kit Cheung,et al.  EditNTS: An Neural Programmer-Interpreter Model for Sentence Simplification through Explicit Editing , 2019, ACL.

[41]  Franck Dernoncourt,et al.  Learning to Fuse Sentences with Transformers for Summarization , 2020, EMNLP.