Language Models are Unsupervised Multitask Learners
暂无分享,去创建一个
Ilya Sutskever | Alec Radford | David Luan | Dario Amodei | Rewon Child | Jeffrey Wu | Jeff Wu | Alec Radford | Dario Amodei | Ilya Sutskever | Rewon Child | D. Luan | I. Sutskever | R. Child
[1] Frederick Jelinek,et al. Interpolated estimation of Markov source parameters from sparse data , 1980 .
[2] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.
[3] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..
[4] Hector J. Levesque,et al. The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.
[5] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..
[6] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[7] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[8] Matthew E. Peters,et al. Content extraction using diverse feature sets , 2013, WWW.
[9] Thorsten Brants,et al. One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.
[10] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[11] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[12] Omer Levy,et al. Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.
[13] Oriol Vinyals,et al. Towards Principled Unsupervised Learning , 2015, ArXiv.
[14] Fei-Fei Li,et al. Visualizing and Understanding Recurrent Networks , 2015, ArXiv.
[15] Quoc V. Le,et al. Semi-supervised Sequence Learning , 2015, NIPS.
[16] Quoc V. Le,et al. A Neural Conversational Model , 2015, ArXiv.
[17] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[18] Felix Hill,et al. Learning Distributed Representations of Sentences from Unlabelled Data , 2016, NAACL.
[19] Yonghui Wu,et al. Exploring the Limits of Language Modeling , 2016, ArXiv.
[20] Jason Weston,et al. The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations , 2015, ICLR.
[21] Jason Weston,et al. Dialog-based Language Learning , 2016, NIPS.
[22] Oriol Vinyals,et al. Multilingual Language Processing From Bytes , 2015, NAACL.
[23] Rudolf Kadlec,et al. Embracing data abundance: BookTest Dataset for Reading Comprehension , 2016, ICLR.
[24] Joshua B. Tenenbaum,et al. Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.
[25] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[26] Sandro Pezzelle,et al. The LAMBADA dataset: Word prediction requiring a broad discourse context , 2016, ACL.
[27] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[28] Bowen Zhou,et al. Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.
[29] Yejin Choi,et al. Story Cloze Task: UW NLP System , 2017, LSDSem@EACL.
[30] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[31] Nicolas Usunier,et al. Improving Neural Language Models with a Continuous Cache , 2016, ICLR.
[32] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.
[33] Ilya Sutskever,et al. Learning to Generate Reviews and Discovering Sentiment , 2017, ArXiv.
[34] Christopher D. Manning,et al. Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.
[35] Quoc V. Le,et al. Unsupervised Pretraining for Sequence to Sequence Learning , 2016, EMNLP.
[36] Yang Yang,et al. Deep Learning Scaling is Predictable, Empirically , 2017, ArXiv.
[37] Richard Socher,et al. Learned in Translation: Contextualized Word Vectors , 2017, NIPS.
[38] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[39] Holger Schwenk,et al. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.
[40] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.
[41] Percy Liang,et al. Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.
[42] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.
[43] Alex Wang,et al. Looking for ELMo's friends: Sentence-Level Pretraining Beyond Language Modeling , 2018, ArXiv.
[44] Yann Dauphin,et al. Hierarchical Neural Story Generation , 2018, ACL.
[45] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[46] Guillaume Lample,et al. Word Translation Without Parallel Data , 2017, ICLR.
[47] Lukasz Kaiser,et al. Generating Wikipedia by Summarizing Long Sequences , 2018, ICLR.
[48] Alexander M. Rush,et al. Entity Tracking Improves Cloze-style Reading Comprehension , 2018, EMNLP.
[49] Alexander M. Rush,et al. Bottom-Up Abstractive Summarization , 2018, EMNLP.
[50] Jackie Chi Kit Cheung,et al. On the Evaluation of Common-Sense Reasoning in Natural Language Understanding , 2018, ArXiv.
[51] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[52] Richard Socher,et al. The Natural Language Decathlon: Multitask Learning as Question Answering , 2018, ArXiv.
[53] Di He,et al. FRAGE: Frequency-Agnostic Word Representation , 2018, NeurIPS.
[54] Christopher Joseph Pal,et al. Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning , 2018, ICLR.
[55] Eneko Agirre,et al. Unsupervised Neural Machine Translation , 2017, ICLR.
[56] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[57] Quoc V. Le,et al. A Simple Method for Commonsense Reasoning , 2018, ArXiv.
[58] Guillaume Lample,et al. Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.
[59] Benjamin Recht,et al. Do CIFAR-10 Classifiers Generalize to CIFAR-10? , 2018, ArXiv.
[60] Ming-Wei Chang,et al. Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.
[61] Lei Yu,et al. Learning and Evaluating General Linguistic Intelligence , 2019, ArXiv.
[62] Zhitao Gong,et al. Strike (With) a Pose: Neural Networks Are Easily Fooled by Strange Poses of Familiar Objects , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[63] Douwe Kiela,et al. No Training Required: Exploring Random Encoders for Sentence Classification , 2019, ICLR.
[64] Jong-Bok Kim,et al. The advantages and challenges of “big data”: Insights from the 14 billion word iWeb corpus , 2019, Linguistic Research.
[65] Danqi Chen,et al. CoQA: A Conversational Question Answering Challenge , 2018, TACL.
[66] Jason Weston,et al. Wizard of Wikipedia: Knowledge-Powered Conversational agents , 2018, ICLR.
[67] Eneko Agirre,et al. An Effective Approach to Unsupervised Machine Translation , 2019, ACL.
[68] Kenton Lee,et al. A BERT Baseline for the Natural Questions , 2019, ArXiv.
[69] Noah Constant,et al. Character-Level Language Modeling with Deeper Self-Attention , 2018, AAAI.
[70] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[71] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[72] Thomas Wolf,et al. TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents , 2019, ArXiv.
[73] Joachim Denzler,et al. Do We Train on Test Data? Purging CIFAR of Near-Duplicates , 2019, J. Imaging.