Pre-trained language model representations for language generation
暂无分享,去创建一个
[1] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Restarts , 2016, ArXiv.
[2] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[3] Geoffrey E. Hinton,et al. Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.
[4] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[5] Richard Socher,et al. A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.
[6] Deniz Yuret,et al. Transfer Learning for Low-Resource Neural Machine Translation , 2016, EMNLP.
[7] Hakan Inan,et al. Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling , 2016, ICLR.
[8] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[9] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[10] Angela Fan,et al. Controllable Abstractive Summarization , 2017, NMT@ACL.
[11] Lior Wolf,et al. Using the Output Embedding to Improve Language Models , 2016, EACL.
[12] Phil Blunsom,et al. Teaching Machines to Read and Comprehend , 2015, NIPS.
[13] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[14] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[15] Alexander M. Rush,et al. Bottom-Up Abstractive Summarization , 2018, EMNLP.
[16] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[18] Christopher D. Manning,et al. Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.
[19] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[20] Matt Post,et al. A Call for Clarity in Reporting BLEU Scores , 2018, WMT.
[21] Luke S. Zettlemoyer,et al. Cloze-driven Pretraining of Self-attention Networks , 2019, EMNLP.
[22] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[23] Samuel R. Bowman,et al. Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks , 2018, ArXiv.
[24] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[25] Alexei Baevski,et al. Adaptive Input Representations for Neural Language Modeling , 2018, ICLR.
[26] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[27] Myle Ott,et al. Scaling Neural Machine Translation , 2018, WMT.
[28] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[29] Bowen Zhou,et al. Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.
[30] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.
[31] Quoc V. Le,et al. Semi-supervised Sequence Learning , 2015, NIPS.
[32] Yoshua Bengio,et al. On Using Monolingual Corpora in Neural Machine Translation , 2015, ArXiv.