How fine can fine-tuning be? Learning efficient language models
暂无分享,去创建一个
[1] Yoshua Bengio,et al. BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 , 2016, ArXiv.
[2] Peter Clark,et al. The Seventh PASCAL Recognizing Textual Entailment Challenge , 2011, TAC.
[3] Samy Bengio,et al. Are All Layers Created Equal? , 2019, J. Mach. Learn. Res..
[4] Svetlana Lazebnik,et al. Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights , 2018, ECCV.
[5] Jason Yosinski,et al. Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask , 2019, NeurIPS.
[6] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[7] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[8] Ido Dagan,et al. The Sixth PASCAL Recognizing Textual Entailment Challenge , 2009, TAC.
[9] Suyog Gupta,et al. To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.
[10] Roy Bar-Haim,et al. The Second PASCAL Recognising Textual Entailment Challenge , 2006 .
[11] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.
[12] Kevin Gimpel,et al. Gaussian Error Linear Units (GELUs) , 2016 .
[13] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.
[14] Mohammad Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[15] Bruno A. Olshausen,et al. Superposition of many models into one , 2019, NeurIPS.
[16] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.
[17] Barbara Caputo,et al. Adding New Tasks to a Single Network with Weight Trasformations using Binary Masks , 2018, ECCV Workshops.
[18] Samuel R. Bowman,et al. Neural Network Acceptability Judgments , 2018, Transactions of the Association for Computational Linguistics.
[19] Ido Dagan,et al. The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.
[20] Mona Attariyan,et al. Parameter-Efficient Transfer Learning for NLP , 2019, ICML.
[21] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[22] Yang Yang,et al. Deep Learning Scaling is Predictable, Empirically , 2017, ArXiv.
[23] Yang Song,et al. Extreme Language Model Compression with Optimal Subwords and Shared Projections , 2019, ArXiv.
[24] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[25] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[26] Mikhail Belkin,et al. Reconciling modern machine learning and the bias-variance trade-off , 2018, ArXiv.
[27] Ido Dagan,et al. The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.
[28] Eneko Agirre,et al. SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.
[29] Chris Brockett,et al. Automatically Constructing a Corpus of Sentential Paraphrases , 2005, IJCNLP.
[30] Max Welling,et al. Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.
[31] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[32] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Hector J. Levesque,et al. The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.