Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
暂无分享,去创建一个
Colin Raffel | Noam Shazeer | Wei Li | Yanqi Zhou | Sharan Narang | Peter J. Liu | Katherine Lee | Adam Roberts | Michael Matena | Colin Raffel | Noam M. Shazeer | Sharan Narang | Yanqi Zhou | Adam Roberts | Katherine Lee | Michael Matena | Wei Li
[1] Wilson L. Taylor,et al. “Cloze Procedure”: A New Tool for Measuring Readability , 1953 .
[2] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.
[3] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.
[4] N. Whitman. A bitter lesson. , 1999, Academic medicine : journal of the Association of American Medical Colleges.
[5] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[6] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[7] Chris Brockett,et al. Automatically Constructing a Corpus of Sentential Paraphrases , 2005, IJCNLP.
[8] Ido Dagan,et al. The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.
[9] J. Clarke,et al. Low-field SQUID MRI: To tune or not to tune? , 2006 .
[10] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[11] Hector J. Levesque,et al. The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.
[12] Vincent Ng,et al. Resolving Complex Cases of Definite Pronouns: The Winograd Schema Challenge , 2012, EMNLP.
[13] Zornitsa Kozareva,et al. SemEval-2012 Task 7: Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning , 2011, *SEMEVAL.
[14] Philipp Koehn,et al. Dirt Cheap Web-Scale Parallel Text from the Common Crawl , 2013, ACL.
[15] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[16] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.
[17] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[18] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[19] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.
[20] Phil Blunsom,et al. A Convolutional Neural Network for Modelling Sentences , 2014, ACL.
[21] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[22] Philipp Koehn,et al. Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.
[23] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[24] Alex Krizhevsky,et al. One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.
[25] Kenneth Heafield,et al. N-gram Counts and Language Models from the Common Crawl , 2014, LREC.
[26] Ivan Laptev,et al. Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[27] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[28] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[29] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[30] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[31] Quoc V. Le,et al. Semi-supervised Sequence Learning , 2015, NIPS.
[32] Jakub Konecný,et al. Federated Optimization: Distributed Optimization Beyond the Datacenter , 2015, ArXiv.
[33] Sanja Fidler,et al. Skip-Thought Vectors , 2015, NIPS.
[34] Xiaodong Liu,et al. Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval , 2015, NAACL.
[35] Phil Blunsom,et al. Teaching Machines to Read and Comprehend , 2015, NIPS.
[36] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[37] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[38] Philipp Koehn,et al. Findings of the 2015 Workshop on Statistical Machine Translation , 2015, WMT@EMNLP.
[39] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[40] Mirella Lapata,et al. Long Short-Term Memory-Networks for Machine Reading , 2016, EMNLP.
[41] Karin M. Verspoor,et al. Findings of the 2016 Conference on Machine Translation , 2016, WMT.
[42] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[43] Felix Hill,et al. Learning Distributed Representations of Sentences from Unlabelled Data , 2016, NAACL.
[44] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Yonghui Wu,et al. Exploring the Limits of Language Modeling , 2016, ArXiv.
[46] Alexei A. Efros,et al. What makes ImageNet good for transfer learning? , 2016, ArXiv.
[47] Iryna Gurevych,et al. C4Corpus: Multilingual Web-size Corpus with Free License , 2016, LREC.
[48] Peter Richtárik,et al. Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.
[49] Samy Bengio,et al. Generating Sentences from a Continuous Space , 2015, CoNLL.
[50] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[51] Bowen Zhou,et al. Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.
[52] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[53] Philip Bachman,et al. NewsQA: A Machine Comprehension Dataset , 2016, Rep4NLP@ACL.
[54] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.
[55] Christopher D. Manning,et al. Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.
[56] Sebastian Ruder,et al. An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.
[57] Quoc V. Le,et al. Unsupervised Pretraining for Sequence to Sequence Learning , 2016, EMNLP.
[58] Yang Yang,et al. Deep Learning Scaling is Predictable, Empirically , 2017, ArXiv.
[59] Eneko Agirre,et al. SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.
[60] Holger Schwenk,et al. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.
[61] Eunsol Choi,et al. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.
[62] Richard Socher,et al. A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.
[63] Kaiming He,et al. Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.
[64] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.
[65] Dustin Tran,et al. Mesh-TensorFlow: Deep Learning for Supercomputers , 2018, NeurIPS.
[66] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[67] Noam Shazeer,et al. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.
[68] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[69] Douwe Kiela,et al. SentEval: An Evaluation Toolkit for Universal Sentence Representations , 2018, LREC.
[70] Lukasz Kaiser,et al. Generating Wikipedia by Summarizing Long Sequences , 2018, ICLR.
[71] Myle Ott,et al. Understanding Back-Translation at Scale , 2018, EMNLP.
[72] José Camacho-Collados,et al. WiC: 10, 000 Example Pairs for Evaluating Context-Sensitive Representations , 2018, NAACL 2019.
[73] Quoc V. Le,et al. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension , 2018, ICLR.
[74] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[75] Prakhar Gupta,et al. Learning Word Vectors for 157 Languages , 2018, LREC.
[76] Richard Socher,et al. The Natural Language Decathlon: Multitask Learning as Question Answering , 2018, ArXiv.
[77] Dan Roth,et al. Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences , 2018, NAACL.
[78] Xiaodong Liu,et al. ReCoRD: Bridging the Gap between Human and Machine Commonsense Reading Comprehension , 2018, ArXiv.
[79] Christopher Joseph Pal,et al. Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning , 2018, ICLR.
[80] Samuel R. Bowman,et al. Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks , 2018, ArXiv.
[81] Matt Post,et al. A Call for Clarity in Reporting BLEU Scores , 2018, WMT.
[82] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[83] Honglak Lee,et al. An efficient framework for learning sentence representations , 2018, ICLR.
[84] Ashish Vaswani,et al. Self-Attention with Relative Position Representations , 2018, NAACL.
[85] Christopher Ré,et al. Snorkel MeTaL: Weak Supervision for Multi-Task Learning , 2018, DEEM@SIGMOD.
[86] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[87] Qiang Yang,et al. An Overview of Multi-task Learning , 2018 .
[88] Quoc V. Le,et al. A Simple Method for Commonsense Reasoning , 2018, ArXiv.
[89] Taku Kudo,et al. Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates , 2018, ACL.
[90] Alex Wang,et al. Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling , 2018, ACL.
[91] Yoram Singer,et al. Memory-Efficient Adaptive Optimization for Large-Scale Learning , 2019, ArXiv.
[92] Rico Sennrich,et al. The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives , 2019, EMNLP.
[93] Xu Tan,et al. MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.
[94] Thomas Wolf,et al. Transfer Learning in Natural Language Processing , 2019, NAACL.
[95] Xiaodong Liu,et al. Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.
[96] Jie Ren,et al. SummAE: Zero-Shot Abstractive Text Summarization using Length-Agnostic Auto-Encoders , 2019, ArXiv.
[97] Mona Attariyan,et al. Parameter-Efficient Transfer Learning for NLP , 2019, ICML.
[98] Judith Tonhauser,et al. The CommitmentBank: Investigating projection in naturally occurring discourse , 2019 .
[99] Kaiming He,et al. Rethinking ImageNet Pre-Training , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[100] Ankur Bapna,et al. Simple, Scalable Adaptation for Neural Machine Translation , 2019, EMNLP.
[101] Iz Beltagy,et al. SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.
[102] Xiaodong Liu,et al. Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.
[103] Lav R. Varshney,et al. CTRL: A Conditional Transformer Language Model for Controllable Generation , 2019, ArXiv.
[104] Sebastian Ruder,et al. Neural transfer learning for natural language processing , 2019 .
[105] Noah A. Smith,et al. To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks , 2019, RepL4NLP@ACL.
[106] Quoc V. Le,et al. Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[107] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[108] Ming-Wei Chang,et al. BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions , 2019, NAACL.
[109] Richard Socher,et al. Unifying Question Answering and Text Classification via Span Extraction , 2019, ArXiv.
[110] Luke S. Zettlemoyer,et al. Cloze-driven Pretraining of Self-attention Networks , 2019, EMNLP.
[111] Yang Liu,et al. Fine-tune BERT for Extractive Summarization , 2019, ArXiv.
[112] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[113] Ali Farhadi,et al. Defending Against Neural Fake News , 2019, NeurIPS.
[114] Jascha Sohl-Dickstein,et al. Measuring the Effects of Data Parallelism on Neural Network Training , 2018, J. Mach. Learn. Res..
[115] Noah Constant,et al. Character-Level Language Modeling with Deeper Self-Attention , 2018, AAAI.
[116] Ankur Bapna,et al. Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges , 2019, ArXiv.
[117] Quoc V. Le,et al. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism , 2018, ArXiv.
[118] Thomas Lukasiewicz,et al. A Surprisingly Robust Trick for the Winograd Schema Challenge , 2019, ACL.
[119] Samuel R. Bowman,et al. Neural Network Acceptability Judgments , 2018, Transactions of the Association for Computational Linguistics.
[120] T. Goldstein,et al. FreeLB: Enhanced Adversarial Training for Language Understanding , 2019, ICLR 2020.
[121] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.
[122] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[123] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[124] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.
[125] Xiaodong Liu,et al. A Hybrid Neural Network Model for Commonsense Reasoning , 2019, EMNLP.
[126] Andrew M. Dai,et al. Music Transformer: Generating Music with Long-Term Structure , 2018, ICLR.
[127] Xin Jiang,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2019, FINDINGS.
[128] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[129] Omer Levy,et al. SpanBERT: Improving Pre-training by Representing and Predicting Spans , 2019, TACL.
[130] Luo Si,et al. StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding , 2019, ICLR.