Neural Data-to-Text Generation Based on Small Datasets: Comparing the Added Value of Two Semi-Supervised Learning Approaches on Top of a Large Language Model

This study discusses the effect of semi-supervised learning in combination with pretrained language models for data-to-text generation. It is not known whether semi-supervised learning is still helpful when a large-scale language model is also supplemented. This study aims to answer this question by comparing a data-to-text system only supplemented with a language model, to two data-to-text systems that are additionally enriched by a data augmentation or a pseudo-labeling semi-supervised learning approach. Results show that semi-supervised learning results in higher scores on diversity metrics. In terms of output quality, extending the training set of a data-to-text system with a language model using the pseudo-labeling approach did increase text quality scores, but the data augmentation approach yielded similar scores to the system without training set extension. These results indicate that semi-supervised learning approaches can bolster output quality and diversity, even when a language model is also present.

[1]  Walter Daelemans,et al.  Cyberbullying Classifiers are Sensitive to Model-Agnostic Perturbations , 2022, LREC.

[2]  Paolo Papotti,et al.  Unsupervised Matching of Data and Text , 2021, 2022 IEEE 38th International Conference on Data Engineering (ICDE).

[3]  Emiel Krahmer,et al.  Human evaluation of automatically generated text: Current trends and best practice guidelines , 2021, Comput. Speech Lang..

[4]  Eduard Hovy,et al.  A Survey of Data Augmentation Approaches for NLP , 2021, FINDINGS.

[5]  Emiel van Miltenburg,et al.  Preregistering NLP research , 2021, NAACL.

[6]  Emily M. Bender,et al.  On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 , 2021, FAccT.

[7]  Ondvrej Duvsek,et al.  AuGPT: Auxiliary Tasks and Data Augmentation for End-To-End Dialogue with Pre-Trained Language Models , 2021, NLP4CONVAI.

[8]  Xiaoyu Shen,et al.  Neural Data-to-Text Generation with LM-based Text Augmentation , 2021, EACL.

[9]  Alex Marin,et al.  Jointly Improving Language Understanding and Generation with Quality-Weighted Weak Supervision of Automatic Labeling , 2021, EACL.

[10]  Chris Emmery,et al.  Adversarial Stylometry in the Wild: Transferable Lexical Substitution Attacks on Author Profiling , 2021, EACL.

[11]  Jacopo Staiano,et al.  Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering , 2020, EMNLP.

[12]  Colin Raffel,et al.  mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer , 2020, NAACL.

[13]  Leonardo F. R. Ribeiro,et al.  Investigating Pretrained Language Models for Graph-to-Text Generation , 2020, NLP4CONVAI.

[14]  Thiago Castro Ferreira,et al.  Enriching the E2E dataset , 2021, INLG.

[15]  Johannes Heinecke,et al.  Denoising Pre-Training and Data Augmentation Strategies for Enhanced RDF Verbalization with Transformers , 2020, WEBNLG.

[16]  Jiwei Li,et al.  Neural Semi-supervised Learning for Text Classification Under Large-Scale Pretraining , 2020, ArXiv.

[17]  Wenhu Chen,et al.  KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation , 2020, EMNLP.

[18]  David Vandyke,et al.  A Generative Model for Joint Natural Language Understanding and Generation , 2020, ACL.

[19]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[20]  Mihir Kale Text-to-Text Pre-Training for Data-to-Text Tasks , 2020, INLG.

[21]  Ramón Fernández Astudillo,et al.  GPT-too: A Language-Model-First Approach for AMR-to-Text Generation , 2020, ACL.

[22]  Diyi Yang,et al.  ToTTo: A Controlled Table-To-Text Generation Dataset , 2020, EMNLP.

[23]  Thibault Sellam,et al.  BLEURT: Learning Robust Metrics for Text Generation , 2020, ACL.

[24]  Zhi Chen,et al.  Semi-Supervised Text Simplification with Back-Translation and Asymmetric Denoising Autoencoders , 2020, AAAI.

[25]  Shang-Yu Su,et al.  Towards Unsupervised Language Understanding and Generation by Joint Dual Learning , 2020, ACL.

[26]  Jimmy J. Lin,et al.  Rapid Adaptation of BERT for Information Extraction on Domain-Specific Business Documents , 2020, ArXiv.

[27]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[28]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[29]  Kilian Q. Weinberger,et al.  BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[30]  Volker Tresp,et al.  An Unsupervised Joint System for Text Generation from Knowledge Graphs and Semantic Parsing , 2019, EMNLP.

[31]  Emiel Krahmer,et al.  The CACAPO Dataset: A Multilingual, Multi-Domain Dataset for Neural Pipeline and End-to-End Data-to-Text Generation , 2020, INLG.

[32]  Thiago Castro Ferreira,et al.  The 2020 Bilingual, Bi-Directional WebNLG+ Shared Task: Overview and Evaluation Results (WebNLG+ 2020) , 2020, WEBNLG.

[33]  Oshin Agarwal,et al.  Machine Translation Aided Bilingual Data-to-Text Generation and Semantic Parsing , 2020, WEBNLG.

[34]  Hongmin Wang,et al.  Revisiting Challenges in Data-to-Text Generation with Fact Grounding , 2020, INLG.

[35]  Tommaso Caselli,et al.  BERTje: A Dutch BERT Model , 2019, ArXiv.

[36]  Kathleen McKeown,et al.  A Good Sample is Hard to Find: Noise Injection Sampling and Self-Training for Neural Language Generation Models , 2019, INLG.

[37]  Björn W. Schuller,et al.  Augment to Prevent: Short-Text Data Augmentation in Deep Learning for Hate-Speech Classification , 2019, CIKM.

[38]  Nguyen Hong Son,et al.  Transfer Learning for Information Extraction with Limited Data , 2019, PACLING.

[39]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[40]  Raheel Qader,et al.  Semi-Supervised Neural Text Generation by Joint Learning of Natural Language Generation and Natural Language Understanding Models , 2019, INLG.

[41]  Emiel Krahmer,et al.  Neural data-to-text generation: A comparison between pipeline and end-to-end architectures , 2019, EMNLP.

[42]  Fei Liu,et al.  MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance , 2019, EMNLP.

[43]  Claire Gardent,et al.  Creating a Corpus for Russian Data-to-Text Generation Using Neural Machine Translation and Post-Editing , 2019, BSNLP@ACL.

[44]  Ming Zhou,et al.  BERT-based Lexical Substitution , 2019, ACL.

[45]  Raffaella Bernardi,et al.  Psycholinguistics Meets Continual Learning: Measuring Catastrophic Forgetting in Visual Question Answering , 2019, ACL.

[46]  Mirella Lapata,et al.  Data-to-text Generation with Entity Modeling , 2019, ACL.

[47]  Andrew McCallum,et al.  Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.

[48]  Vrindavan Harrison,et al.  Curate and Generate: A Corpus and Method for Joint Control of Semantics and Style in Neural NLG , 2019, ACL.

[49]  Michael Collins,et al.  Synthetic QA Corpora Generation with Roundtrip Consistency , 2019, ACL.

[50]  Patrick Gallinari,et al.  Copy mechanism and tailored training for character-based data-to-text generation , 2019, ECML/PKDD.

[51]  Anirban Laha,et al.  Unsupervised Neural Text Simplification , 2018, ACL.

[52]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[53]  Martijn Goudbeek,et al.  On task effects in NLG corpus elicitation: a replication study using mixed effects modeling , 2019, INLG.

[54]  Emiel Krahmer,et al.  Enriching the WebNLG corpus , 2018, INLG.

[55]  Mamoru Komachi,et al.  RUSE: Regressor Using Sentence Embeddings for Automatic Machine Translation Evaluation , 2018, WMT.

[56]  Piek T. J. M. Vossen,et al.  Measuring the Diversity of Automatic Image Descriptions , 2018, COLING.

[57]  Emiel Krahmer,et al.  Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation , 2017, J. Artif. Intell. Res..

[58]  Shubhangi Tandon,et al.  TNT-NLG , System 2 : Data Repetition and Meaning Representation Manipulation to Improve Neural Generation , 2018 .

[59]  Claire Gardent,et al.  The WebNLG Challenge: Generating Text from RDF Data , 2017, INLG.

[60]  Shashi Narayan,et al.  Creating Training Corpora for NLG Micro-Planners , 2017, ACL.

[61]  Alexander M. Rush,et al.  Challenges in Data-to-Document Generation , 2017, EMNLP.

[62]  Verena Rieser,et al.  The E2E Dataset: New Challenges For End-to-End Generation , 2017, SIGDIAL Conference.

[63]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[64]  Dimitra Gkatzia,et al.  Content Selection in Data-to-Text Systems: A Survey , 2016, ArXiv.

[65]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[66]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[67]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[68]  Johan F. Hoorn,et al.  Web intelligence for the assesment of information quality: Credibility, correctness, and readability , 2010 .

[69]  Roberto Navigli,et al.  SemEval-2007 Task 10: English Lexical Substitution Task , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[70]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[71]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[72]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[73]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[74]  Alison A. Plessinger,et al.  Exploring Receivers' Criteria for Perception of Print and Online News , 1999 .

[75]  Kristian J. Hammond,et al.  The FindMe Approach to Assisted Browsing , 1997, IEEE Expert.

[76]  J. Fleiss,et al.  Measuring Agreement for Multinomial Data , 1982 .

[77]  John Robert Ross,et al.  Where's English? , 1979 .

[78]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.