Distilling Knowledge Learned in BERT for Text Generation
暂无分享,去创建一个
Zhe Gan | Jingjing Liu | Yen-Chun Chen | Yu Cheng | Jingzhou Liu | Zhe Gan | Jingzhou Liu | Jingjing Liu | Yen-Chun Chen | Yu Cheng
[1] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[2] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[3] Rich Caruana,et al. Model compression , 2006, KDD '06.
[4] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[5] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[6] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[7] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[8] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[9] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[10] Christopher D. Manning,et al. Stanford Neural Machine Translation Systems for Spoken Language Domains , 2015, IWSLT.
[11] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[12] Jason Weston,et al. A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.
[13] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[14] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[15] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Samy Bengio,et al. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.
[17] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.
[18] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[19] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[20] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[21] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Bowen Zhou,et al. Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.
[23] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[24] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.
[25] Zhe Gan,et al. Semantic Compositional Networks for Visual Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[27] Richard Socher,et al. Learned in Translation: Contextualized Word Vectors , 2017, NIPS.
[28] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.
[29] Quoc V. Le,et al. Semi-Supervised Sequence Modeling with Cross-View Training , 2018, EMNLP.
[30] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.
[31] Xu Sun,et al. Global Encoding for Abstractive Summarization , 2018, ACL.
[32] Marc'Aurelio Ranzato,et al. Classical Structured Prediction Losses for Sequence to Sequence Learning , 2017, NAACL.
[33] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[34] Furu Wei,et al. Faithful to the Original: Fact Aware Neural Abstractive Summarization , 2017, AAAI.
[35] Yen-Chun Chen,et al. Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting , 2018, ACL.
[36] Christopher Joseph Pal,et al. Twin Networks: Matching the Future for Sequence Generation , 2017, ICLR.
[37] Alexander M. Rush,et al. OpenNMT: Neural Machine Translation Toolkit , 2018, AMTA.
[38] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[39] Myle Ott,et al. Scaling Neural Machine Translation , 2018, WMT.
[40] Seung-won Hwang,et al. Entity Commonsense Representation for Neural Abstractive Summarization , 2018, NAACL.
[41] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[42] Guillaume Lample,et al. Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.
[43] Xu Tan,et al. MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.
[44] Omer Levy,et al. Constant-Time Machine Translation with Conditional Masked Language Models , 2019, IJCNLP 2019.
[45] Di He,et al. Multilingual Neural Machine Translation with Knowledge Distillation , 2019, ICLR.
[46] Zhe Gan,et al. Improving Sequence-to-Sequence Learning via Optimal Transport , 2019, ICLR.
[47] Yu Cheng,et al. Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.
[48] Enhong Chen,et al. Regularizing Neural Machine Translation by Target-bidirectional Agreement , 2018, AAAI.
[49] Ji Wang,et al. Pretraining-Based Natural Language Generation for Text Summarization , 2019, CoNLL.
[50] Alex Wang,et al. BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model , 2019, Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation.
[51] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[52] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.
[53] Yann Dauphin,et al. Pay Less Attention with Lightweight and Dynamic Convolutions , 2019, ICLR.
[54] Towards Making the Most of BERT in Neural Machine Translation , 2019, AAAI.