Controlling hallucinations at word level in data-to-text generation

Data-to-Text Generation (DTG) is a subfield of Natural Language Generation aiming at transcribing structured data in natural language descriptions. The field has been recently boosted by the use of neural-based generators which exhibit on one side great syntactic skills without the need of hand-crafted pipelines; on the other side, the quality of the generated text reflects the quality of the training data, which in realistic settings only offer imperfectly aligned structure-text pairs. Consequently, state-of-art neural models include misleading statements –usually called hallucinations—in their outputs. The control of this phenomenon is today a major challenge for DTG, and is the problem addressed in the paper. Previous work deal with this issue at the instance level: using an alignment score for each table-reference pair. In contrast, we propose a finer-grained approach, arguing that hallucinations should rather be treated at the word level. Specifically, we propose a Multi-Branch Decoder which is able to leverage word-level labels to learn the relevant parts of each training instance. These labels are obtained following a simple and efficient scoring procedure based on co-occurrence analysis and dependency parsing. Extensive evaluations, via automated metrics and human judgment on the standard WikiBio benchmark, show the accuracy of our alignment labels and the effectiveness of the proposed Multi-Branch Decoder. Our model is able to reduce and control hallucinations, while keeping fluency and coherence in generated texts. Further experiments on a degraded version of ToTTo show that our model could be successfully used on very noisy settings.

[1]  Verena Rieser,et al.  The E2E Dataset: New Challenges For End-to-End Generation , 2017, SIGDIAL Conference.

[2]  Alexander M. Rush,et al.  Challenges in Data-to-Document Generation , 2017, EMNLP.

[3]  Mirella Lapata,et al.  Data-to-Text Generation with Content Selection and Planning , 2018, AAAI.

[4]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[5]  Philip Resnik,et al.  Bootstrapping parsers via syntactic projection across parallel texts , 2005, Natural Language Engineering.

[6]  Anja Belz,et al.  An Investigation into the Validity of Some Metrics for Automatically Evaluating Natural Language Generation Systems , 2009, CL.

[7]  Zhifang Sui,et al.  Towards Comprehensive Description Generation from Factual Attribute-value Tables , 2019, ACL.

[8]  Emiel Krahmer,et al.  Neural data-to-text generation: A comparison between pipeline and end-to-end architectures , 2019, EMNLP.

[9]  Fei Xia,et al.  Converting Dependency Structures to Phrase Structures , 2001, HLT.

[10]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[11]  Mirella Lapata,et al.  Bootstrapping Generators from Noisy Data , 2018, NAACL.

[12]  Cristina Bosco,et al.  PartTUT: The Turin University Parallel Treebank , 2015, Italian Natural Language Processing within the PARLI Project.

[13]  Sanja Stajner,et al.  When Shallow is Good Enough: Automatic Assessment of Conceptual Text Complexity using Shallow Semantic Features , 2020, LREC.

[14]  Ondrej Dusek,et al.  Data-to-Text Generation with Iterative Text Editing , 2020, INLG.

[15]  Christopher D. Manning,et al.  Stanza: A Python Natural Language Processing Toolkit for Many Human Languages , 2020, ACL.

[16]  Claire Gardent,et al.  Analysing Data-To-Text Generation Benchmarks , 2017, INLG.

[17]  David Grangier,et al.  Neural Text Generation from Structured Data with Application to the Biography Domain , 2016, EMNLP.

[18]  Shashi Narayan,et al.  Deep Learning Approaches to Text Production , 2018, NAACL.

[19]  Mihir Kale,et al.  Text-to-Text Pre-Training for Data-to-Text Tasks , 2020, INLG.

[20]  Owen Rambow,et al.  Handling Stuctural Divergences and Recovering Dropped Arguments in a Korean/English Machine Translation System , 2000, AMTA.

[21]  Zhifang Sui,et al.  Hierarchical Encoder with Auxiliary Supervision for Neural Table-to-Text Generation: Learning Better Representation for Tables , 2019, AAAI.

[22]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[23]  Eric P. Xing,et al.  Toward Controlled Generation of Text , 2017, ICML.

[24]  Alexander M. Rush,et al.  Learning Neural Templates for Text Generation , 2018, EMNLP.

[25]  Yoav Goldberg,et al.  Controlling Linguistic Style Aspects in Neural Language Generation , 2017, ArXiv.

[26]  Katja Filippova Controlled Hallucinations: Learning to Generate Faithfully from Noisy Data , 2020, FINDINGS.

[27]  Marilyn A. Walker,et al.  A Deep Ensemble Model with Slot Alignment for Sequence-to-Sequence Natural Language Generation , 2018, NAACL.

[28]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[29]  Ehud Reiter A Structured Review of the Validity of BLEU , 2018, Computational Linguistics.

[30]  Ehud Reiter,et al.  A Structured Review of the Validity of BLEU , 2018, CL.

[31]  Zhifang Sui,et al.  Table-to-text Generation by Structure-aware Seq2seq Learning , 2017, AAAI.

[32]  Rico Sennrich,et al.  Controlling Politeness in Neural Machine Translation via Side Constraints , 2016, NAACL.

[33]  Hannu Toivonen,et al.  Data-Driven News Generation for Automated Journalism , 2017, INLG.

[34]  David Vandyke,et al.  Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems , 2015, EMNLP.

[35]  Craig Thomson,et al.  Studying the Impact of Filling Information Gaps on the Output Quality of Neural Data-to-Text , 2020, INLG.

[36]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[37]  Dietrich Klakow,et al.  Neural Data-to-Text Generation via Jointly Learning the Segmentation and Correspondence , 2020, ACL.

[38]  Verena Rieser,et al.  Why We Need New Evaluation Metrics for NLG , 2017, EMNLP.

[39]  Diyi Yang,et al.  ToTTo: A Controlled Table-To-Text Generation Dataset , 2020, EMNLP.

[40]  Nils Smeuninx,et al.  Measuring the Readability of Sustainability Reports: A Corpus-Based Analysis Through Standard Formulae and NLP , 2020 .

[41]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[42]  Mirella Lapata,et al.  Learning to Generate Product Reviews from Attributes , 2017, EACL.

[43]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[44]  Shashi Narayan,et al.  Creating Training Corpora for NLG Micro-Planners , 2017, ACL.

[45]  Matthew R. Walter,et al.  What to talk about and how? Selective Generation using LSTMs with Coarse-to-Fine Alignment , 2015, NAACL.

[46]  Rens Bod,et al.  Children's Grammars Grow More Abstract with Age - Evidence from an Automatic Procedure for Identifying the Productive Units of Language , 2009, Top. Cogn. Sci..

[47]  Vlado Keselj,et al.  Twitter User Profiling: Bot and Gender Identification , 2019, CLEF.

[48]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[49]  Emiel Krahmer,et al.  Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation , 2017, J. Artif. Intell. Res..

[50]  Richard Socher,et al.  Evaluating the Factual Consistency of Abstractive Text Summarization , 2019, EMNLP.

[51]  Colin Cherry,et al.  A Systematic Comparison of Smoothing Techniques for Sentence-Level BLEU , 2014, WMT@ACL.

[52]  Mirella Lapata,et al.  Data-to-text Generation with Entity Modeling , 2019, ACL.

[53]  Sergiu Nisioi,et al.  CoCo: A Tool for Automatically Assessing Conceptual Complexity of Texts , 2020, LREC.

[54]  Patrick Gallinari,et al.  Copy mechanism and tailored training for character-based data-to-text generation , 2019, ECML/PKDD.

[55]  Thomas Wolf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[56]  Amy Loutfi,et al.  Towards NLG for Physiological Data Monitoring with Body Area Networks , 2013, ENLG.

[57]  Jianfeng Gao,et al.  A Persona-Based Neural Conversation Model , 2016, ACL.

[58]  Albert Gatt,et al.  Best practices for the human evaluation of automatically generated text , 2019, INLG.

[59]  Verena Rieser,et al.  Semantic Noise Matters for Neural Natural Language Generation , 2019, INLG.

[60]  Chin-Yew Lin,et al.  A Simple Recipe towards Reducing Hallucination in Neural Surface Realisation , 2019, ACL.

[61]  M. Crawford The Art of Readable Writing , 1969 .

[62]  Hongmin Wang,et al.  Revisiting Challenges in Data-to-Text Generation with Fact Grounding , 2020, INLG.

[63]  Ankur P. Parikh,et al.  Sticking to the Facts: Confident Decoding for Faithful Data-to-Text Generation , 2019, ArXiv.

[64]  Graham Neubig,et al.  Controlling Output Length in Neural Encoder-Decoders , 2016, EMNLP.

[65]  Robert Dale,et al.  Building applied natural language generation systems , 1997, Natural Language Engineering.

[66]  Albert Gatt,et al.  Automatic generation of textual summaries from neonatal intensive care data , 2009 .

[67]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[68]  Zhiyu Chen,et al.  Few-shot NLG with Pre-trained Language Model , 2020, ACL.

[69]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[70]  Ankur Parikh,et al.  Handling Divergent Reference Texts when Evaluating Table-to-Text Generation , 2019, ACL.

[71]  Alexander M. Rush,et al.  End-to-End Content and Plan Selection for Data-to-Text Generation , 2018, INLG.

[72]  Wentao Wang,et al.  Data-to-Text Generation with Style Imitation , 2019, EMNLP 2020.

[73]  Patrick Gallinari,et al.  A Hierarchical Model for Data-to-Text Generation , 2019, ECIR.

[74]  Claire Gardent,et al.  Deep Learning Approaches to Text Production , 2020 .