论文信息 - Pre-trained language model representations for language generation - 字舞流文

Pre-trained language model representations for language generation

Pre-trained language model representations have been successful in a wide range of language understanding tasks. In this paper, we examine different strategies to integrate pre-trained representations into sequence to sequence models and apply it to neural machine translation and abstractive summarization. We find that pre-trained representations are most effective when added to the encoder network which slows inference by only 14%. Our experiments in machine translation show gains of up to 5.3 BLEU in a simulated resource-poor setup. While returns diminish with more labeled data, we still observe improvements when millions of sentence-pairs are available. Finally, on abstractive summarization we achieve a new state of the art on the full text version of CNN/DailyMail.

Sergey Edunov | Alexei Baevski | Michael Auli | Michael Auli | Alexei Baevski | Sergey Edunov

[1] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Restarts , 2016, ArXiv.

[2] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.

[3] Geoffrey E. Hinton,et al. Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.

[4] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[5] Richard Socher,et al. A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.

[6] Deniz Yuret,et al. Transfer Learning for Low-Resource Neural Machine Translation , 2016, EMNLP.

[7] Hakan Inan,et al. Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling , 2016, ICLR.

[8] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .

[9] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[10] Angela Fan,et al. Controllable Abstractive Summarization , 2017, NMT@ACL.

[11] Lior Wolf,et al. Using the Output Embedding to Improve Language Models , 2016, EACL.

[12] Phil Blunsom,et al. Teaching Machines to Read and Comprehend , 2015, NIPS.

[13] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[14] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.

[15] Alexander M. Rush,et al. Bottom-Up Abstractive Summarization , 2018, EMNLP.

[16] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[18] Christopher D. Manning,et al. Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[19] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[20] Matt Post,et al. A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[21] Luke S. Zettlemoyer,et al. Cloze-driven Pretraining of Self-attention Networks , 2019, EMNLP.

[22] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23] Samuel R. Bowman,et al. Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks , 2018, ArXiv.

[24] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[25] Alexei Baevski,et al. Adaptive Input Representations for Neural Language Modeling , 2018, ICLR.

[26] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[27] Myle Ott,et al. Scaling Neural Machine Translation , 2018, WMT.

[28] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[29] Bowen Zhou,et al. Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[30] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[31] Quoc V. Le,et al. Semi-supervised Sequence Learning , 2015, NIPS.

[32] Yoshua Bengio,et al. On Using Monolingual Corpora in Neural Machine Translation , 2015, ArXiv.