Bridging Subword Gaps in Pretrain-Finetune Paradigm for Natural Language Generation

A well-known limitation in pretrain-finetune paradigm lies in its inflexibility caused by the one-size-fits-all vocabulary. This potentially weakens the effect when applying pretrained models into natural language generation (NLG) tasks, especially for the subword distributions between upstream and downstream tasks with significant discrepancy. Towards approaching this problem, we extend the vanilla pretrain-finetune pipeline with an extra embedding transfer step. Specifically, a plug-and-play embedding generator is introduced to produce the representation of any input token, according to pre-trained embeddings of its morphologically similar ones. Thus, embeddings of mismatch tokens in downstream tasks can also be efficiently initialized. We conduct experiments on a variety of NLG tasks under the pretrain-finetune fashion. Experimental results and extensive analyses show that the proposed strategy offers us opportunities to feel free to transfer the vocabulary, leading to more efficient and better performed downstream NLG models. 1

[1]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[2]  Jianwei Cui,et al.  Improving Tree-Structured Decoder Training for Code Generation via Mutual Learning , 2021, AAAI.

[3]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[4]  Marjan Ghazvininejad,et al.  Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.

[5]  Yang Liu,et al.  Multi-Domain Neural Machine Translation with Word-Level Domain Context Discrimination , 2018, EMNLP.

[6]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[7]  Rico Sennrich,et al.  In Neural Machine Translation, What Does Transfer Learning Transfer? , 2020, ACL.

[8]  David Chiang,et al.  Transfer Learning across Low-Resource, Related Languages for Neural Machine Translation , 2017, IJCNLP.

[9]  Naren Ramakrishnan,et al.  Neural Abstractive Text Summarization with Sequence-to-Sequence Models , 2018, Trans. Data Sci..

[10]  Tie-Yan Liu,et al.  Incorporating BERT into Neural Machine Translation , 2020, ICLR.

[11]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[12]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[13]  Haibo Zhang,et al.  Self-Paced Learning for Neural Machine Translation , 2020, EMNLP.

[14]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[15]  Liang Tian,et al.  UM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation , 2014, LREC.

[16]  Shashi Narayan,et al.  Leveraging Pre-trained Checkpoints for Sequence Generation Tasks , 2019, Transactions of the Association for Computational Linguistics.

[17]  Sergey Edunov,et al.  Pre-trained language model representations for language generation , 2019, NAACL.

[18]  Shuming Shi,et al.  On the Inference Calibration of Neural Machine Translation , 2020, ACL.

[19]  Yang Liu,et al.  Exploring Discriminative Word-Level Domain Contexts for Multi-Domain Neural Machine Translation , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Jiancheng Lv,et al.  GLGE: A New General Language Generation Evaluation Benchmark , 2021, FINDINGS.

[21]  Kentaro Inui,et al.  Subword-based Compact Reconstruction of Word Embeddings , 2019, NAACL-HLT.

[22]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[23]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[24]  Jiebo Luo,et al.  An Iterative Multi-Source Mutual Knowledge Transfer Framework for Machine Reading Comprehension , 2020, IJCAI.

[25]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[26]  Ankur Bapna,et al.  Revisiting Character-Based Neural Machine Translation with Capacity and Compression , 2018, EMNLP.

[27]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29]  Jacob Eisenstein,et al.  Mimicking Word Embeddings using Subword RNNs , 2017, EMNLP.

[30]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[31]  An Yang,et al.  Enhancing Pre-Trained Language Representations with Rich Knowledge for Machine Reading Comprehension , 2019, ACL.

[32]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[33]  Masaru Kitsuregawa,et al.  Vocabulary Adaptation for Domain Adaptation in Neural Machine Translation , 2020, FINDINGS.

[34]  Taku Kudo,et al.  Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates , 2018, ACL.

[35]  Haibo Zhang,et al.  Domain Transfer based Data Augmentation for Neural Query Translation , 2020, COLING.

[36]  Philip S. Yu,et al.  BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis , 2019, NAACL.

[37]  Elena Voita,et al.  BPE-Dropout: Simple and Effective Subword Regularization , 2020, ACL.

[38]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[39]  Fandong Meng,et al.  Exploring Dynamic Selection of Branch Expansion Orders for Code Generation , 2021, ACL.

[40]  Alexandros Potamianos,et al.  An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models , 2019, NAACL.

[41]  Samy Bengio,et al.  Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[42]  Ondrej Bojar,et al.  Trivial Transfer Learning for Low-Resource Neural Machine Translation , 2018, WMT.

[43]  Huda Khayrallah,et al.  Regularized Training Objective for Continued Training for Domain Adaptation in Neural Machine Translation , 2018, NMT@ACL.

[44]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[45]  Masaru Kitsuregawa,et al.  Robust Backed-off Estimation of Out-of-Vocabulary Embeddings , 2020, FINDINGS.

[46]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[47]  Yingyu Liang,et al.  Generalizing Word Embeddings using Bag of Subwords , 2018, EMNLP.