论文信息 - Data-to-Text Generation with Iterative Text Editing - 字舞流文

Data-to-Text Generation with Iterative Text Editing

We present a novel approach to data-to-text generation based on iterative text editing. Our approach maximizes the completeness and semantic accuracy of the output text while leveraging the abilities of recent pre-trained models for text editing (LaserTagger) and language modeling (GPT-2) to improve the text fluency. To this end, we first transform data items to text using trivial templates, and then we iteratively improve the resulting text by a neural model trained for the sentence fusion task. The output of the model is filtered by a simple heuristic and reranked with an off-the-shelf pre-trained language model. We evaluate our approach on two major data-to-text datasets (WebNLG, Cleaned E2E) and analyze its caveats and benefits. Furthermore, we show that our formulation of data-to-text generation opens up the possibility for zero-shot domain adaptation using a general-domain dataset for sentence fusion.

Ondrej Dusek | Zdenek Kasner

[1] Verena Rieser,et al. The E2E Dataset: New Challenges For End-to-End Generation , 2017, SIGDIAL Conference.

[2] Mihir Kale,et al. Few-Shot Natural Language Generation by Rewriting Templates , 2020, ArXiv.

[3] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[5] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[6] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[7] Mihir Kale,et al. Template Guided Text Generation for Task Oriented Dialogue , 2020, EMNLP.

[8] Stefan Ultes,et al. MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling , 2018, EMNLP.

[9] George R. Doddington,et al. Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[10] Changhan Wang,et al. Levenshtein Transformer , 2019, NeurIPS.

[11] Shashi Narayan,et al. Sticking to the Facts: Confident Decoding for Faithful Data-to-Text Generation , 2019, ArXiv.

[12] Franck Dernoncourt,et al. Scoring Sentence Singletons and Pairs for Abstractive Summarization , 2019, ACL.

[13] Ido Dagan,et al. Step-by-Step: Separating Planning from Realization in Neural Data-to-Text Generation , 2019, NAACL.

[14] Ehud Reiter,et al. Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[15] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[16] Claire Gardent,et al. The WebNLG Challenge: Generating Text from RDF Data , 2017, INLG.

[17] Emiel Krahmer,et al. Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation , 2017, J. Artif. Intell. Res..

[18] Kathleen McKeown,et al. A Good Sample is Hard to Find: Noise Injection Sampling and Self-Training for Neural Language Generation Models , 2019, INLG.

[19] Guillermo Garrido,et al. FELIX: Flexible Text Editing Through Tagging and Insertion , 2020, FINDINGS.

[20] Ondrej Dusek,et al. Evaluating Semantic Accuracy of Data-to-Text Generation with Natural Language Inference , 2020, INLG.

[21] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[22] Mihir Kale. Text-to-Text Pre-Training for Data-to-Text Tasks , 2020, INLG.

[23] Verena Rieser,et al. Semantic Noise Matters for Neural Natural Language Generation , 2019, INLG.

[24] Aliaksei Severyn,et al. Encode, Tag, Realize: High-Precision Text Editing , 2019, EMNLP.

[25] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[26] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[27] Emiel Krahmer,et al. Neural data-to-text generation: A comparison between pipeline and end-to-end architectures , 2019, EMNLP.

[28] Emiel Krahmer,et al. Enriching the WebNLG corpus , 2018, INLG.

[29] Idan Szpektor,et al. DiscoFuse: A Large-Scale Dataset for Discourse-Based Sentence Fusion , 2019, NAACL.

[30] Amir Saffari,et al. Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity , 2020, COLING.

[31] Chin-Yew Lin,et al. A Simple Recipe towards Reducing Hallucination in Neural Surface Realisation , 2019, ACL.

[32] Ondrej Dusek,et al. Sequence-to-Sequence Generation for Spoken Dialogue via Deep Syntax Trees and Strings , 2016, ACL.

[33] Zhiyu Chen,et al. Few-shot NLG with Pre-trained Language Model , 2020, ACL.

[34] Alexander M. Rush,et al. Challenges in Data-to-Document Generation , 2017, EMNLP.

[35] Marilyn A. Walker,et al. A Deep Ensemble Model with Slot Alignment for Sequence-to-Sequence Natural Language Generation , 2018, NAACL.

[36] Ido Dagan,et al. Improving Quality and Efficiency in Plan-based Neural Data-to-Text Generation , 2019, INLG.

[37] Verena Rieser,et al. Evaluating the State-of-the-Art of End-to-End Natural Language Generation: The E2E NLG Challenge , 2019, Comput. Speech Lang..

[38] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[39] Regina Barzilay,et al. Sentence Fusion for Multidocument News Summarization , 2005, CL.

[40] Jackie Chi Kit Cheung,et al. EditNTS: An Neural Programmer-Interpreter Model for Sentence Simplification through Explicit Editing , 2019, ACL.

[41] Franck Dernoncourt,et al. Learning to Fuse Sentences with Transformers for Summarization , 2020, EMNLP.