Pretrained Language Models for Text Generation: A Survey

Text generation has become one of the most important yet challenging tasks in natural language processing (NLP). The resurgence of deep learning has greatly advanced this field by neural generation models, especially the paradigm of pretrained language models (PLMs). In this paper, we present an overview of the major advances achieved in the topic of PLMs for text generation. As the preliminaries, we present the general task definition and briefly describe the mainstream architectures of PLMs for text generation. As the core content, we discuss how to adapt existing PLMs to model different input data and satisfy special properties in the generated text. We further summarize several important fine-tuning strategies for text generation. Finally, we present several future directions and conclude this paper. Our survey aims to provide text generation researchers a synthesis and pointer to related research.

[1]  Yang Song,et al.  Generating Long and Informative Reviews with Aspect-Aware Coarse-to-Fine Decoding , 2019, ACL.

[2]  Jianfeng Gao,et al.  Unified Vision-Language Pre-Training for Image Captioning and VQA , 2020, AAAI.

[3]  Xu Tan,et al.  MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.

[4]  Yu Cheng,et al.  Discourse-Aware Neural Extractive Text Summarization , 2020, ACL.

[5]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[6]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[7]  Chenguang Zhu,et al.  TED: A Pretrained Unsupervised Summarization Model with Theme Modeling and Denoising , 2020, EMNLP.

[8]  Quan Z. Sheng,et al.  A Short Survey of Pre-trained Language Models for Conversational AI-A New Age in NLP , 2020, ACSW.

[9]  Maosong Sun,et al.  ERNIE: Enhanced Language Representation with Informative Entities , 2019, ACL.

[10]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[11]  Cordelia Schmid,et al.  Contrastive Bidirectional Transformer for Temporal Representation Learning , 2019, ArXiv.

[12]  Ronald S. Ross,et al.  Guide for Conducting Risk Assessments , 2012 .

[13]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[14]  Nicholas Jing Yuan,et al.  Knowledge-based Review Generation by Coherence Enhanced Text Planning , 2021, SIGIR.

[15]  Chongruo Wu,et al.  PRAL: A Tailored Pre-Training Model for Task-Oriented Dialog Generation , 2020, ACL.

[16]  Lav R. Varshney,et al.  CTRL: A Conditional Transformer Language Model for Controllable Generation , 2019, ArXiv.

[17]  Zhen Guo,et al.  PLATO-2: Towards Building an Open-Domain Chatbot via Curriculum Learning , 2020, ArXiv.

[18]  Dawn Song,et al.  Pretrained Transformers Improve Out-of-Distribution Robustness , 2020, ACL.

[19]  Ramón Fernández Astudillo,et al.  GPT-too: A Language-Model-First Approach for AMR-to-Text Generation , 2020, ACL.

[20]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[21]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[22]  Yan Zeng,et al.  Generalized Conditioned Dialogue Generation Based on Pre-trained Language Model , 2020, ArXiv.

[23]  Iryna Gurevych,et al.  Investigating Pretrained Language Models for Graph-to-Text Generation , 2020, ArXiv.

[24]  Xiaodong Liu,et al.  Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.

[25]  Richard Socher,et al.  Improving Abstraction in Text Summarization , 2018, EMNLP.

[26]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[27]  Ji-Rong Wen,et al.  TextBox: A Unified, Modularized, and Extensible Framework for Text Generation , 2021, ACL.

[28]  Mirella Lapata,et al.  Text Summarization with Pretrained Encoders , 2019, EMNLP.

[29]  Wei Wu,et al.  Knowledge-Grounded Dialogue Generation with Pre-trained Language Models , 2020, EMNLP.

[30]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[31]  I. Smetannikov,et al.  Survey on Automatic Text Summarization and Transformer Models Applicability , 2020, CCRIS.

[32]  Zhiyu Chen,et al.  Few-shot NLG with Pre-trained Language Model , 2020, ACL.

[33]  Zhe Gan,et al.  Distilling Knowledge Learned in BERT for Text Generation , 2019, ACL.

[34]  Nicholas Jing Yuan,et al.  Knowledge-Enhanced Personalized Review Generation with Capsule Graph Neural Network , 2020, CIKM.

[35]  Tomoharu Iwata,et al.  Unsupervised Cross-lingual Word Embedding by Multilingual Neural Language Models , 2018, ArXiv.

[36]  Shashi Narayan,et al.  Leveraging Pre-trained Checkpoints for Sequence Generation Tasks , 2019, Transactions of the Association for Computational Linguistics.

[37]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[38]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[39]  Zhen Yang,et al.  CSP: Code-Switching Pre-training for Neural Machine Translation , 2020, EMNLP.

[40]  Ji Wang,et al.  Pretraining-Based Natural Language Generation for Text Summarization , 2019, CoNLL.

[41]  Thomas Wolf,et al.  TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents , 2019, ArXiv.

[42]  Xiaocheng Feng,et al.  TableGPT: Few-shot Table-to-Text Generation with Table Structure Reconstruction and Content Matching , 2020, COLING.

[43]  Mirella Lapata,et al.  Sentence Centrality Revisited for Unsupervised Summarization , 2019, ACL.

[44]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[45]  Furu Wei,et al.  Unsupervised Extractive Summarization by Pre-training Hierarchical Transformers , 2020, FINDINGS.

[46]  Cordelia Schmid,et al.  VideoBERT: A Joint Model for Video and Language Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[47]  Nan Duan,et al.  XGPT: Cross-modal Generative Pre-Training for Image Captioning , 2020, NLPCC.

[48]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[49]  Shiyu Zhou,et al.  Unsupervised pre-traing for sequence to sequence speech recognition , 2019, ArXiv.

[50]  Jianfeng Gao,et al.  DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation , 2020, ACL.

[51]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[52]  Ji-Rong Wen,et al.  Few-shot Knowledge Graph-to-Text Generation with Pretrained Language Models , 2021, FINDINGS.

[53]  Xipeng Qiu,et al.  Pre-trained models for natural language processing: A survey , 2020, Science China Technological Sciences.

[54]  Ming Zhou,et al.  HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization , 2019, ACL.