Parameter-Efficient Transfer Learning for NLP
暂无分享,去创建一个
Mona Attariyan | Stanislaw Jastrzebski | Neil Houlsby | Sylvain Gelly | Andrea Gesmundo | Andrei Giurgiu | Quentin de Laroussilhe | Bruna Morrone | S. Gelly | Stanislaw Jastrzebski | Mona Attariyan | N. Houlsby | Andrea Gesmundo | A. Giurgiu | Bruna Morrone
[1] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .
[2] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.
[3] Ken Lang,et al. NewsWeeder: Learning to Filter Netnews , 1995, ICML.
[4] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.
[5] R. French. Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.
[6] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..
[7] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.
[8] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[9] Yoshua Bengio,et al. Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.
[10] Akebo Yamakami,et al. Contributions to the study of SMS spam filtering: new collection and results , 2011, DocEng '11.
[11] Yifan Gong,et al. Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[12] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[13] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.
[14] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[15] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[16] Quoc V. Le,et al. Distributed Representations of Sentences and Documents , 2014, ICML.
[17] Quoc V. Le,et al. Semi-supervised Sequence Learning , 2015, NIPS.
[18] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[19] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[20] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[21] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Alexei A. Efros,et al. What makes ImageNet good for transfer learning? , 2016, ArXiv.
[23] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[24] Noah A. Smith,et al. Deep Multitask Learning for Semantic Dependency Parsing , 2017, ACL.
[25] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.
[26] Eunsol Choi,et al. Coarse-to-Fine Question Answering for Long Documents , 2016, ACL.
[27] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.
[28] Lucas Beyer,et al. In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.
[29] Martin Wattenberg,et al. Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.
[30] Surya Ganguli,et al. Continual Learning Through Synaptic Intelligence , 2017, ICML.
[31] Richard Socher,et al. Learned in Translation: Contextualized Word Vectors , 2017, NIPS.
[32] Andrea Vedaldi,et al. Learning multiple visual domains with residual adapters , 2017, NIPS.
[33] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.
[34] Holger Schwenk,et al. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.
[35] Hugo Larochelle,et al. Modulating early visual processing by language , 2017, NIPS.
[36] Nan Hua,et al. Universal Sentence Encoder for English , 2018, EMNLP.
[37] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.
[38] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[39] Andrea Vedaldi,et al. Efficient Parametrization of Multi-domain Deep Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[40] Percy Liang,et al. Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.
[41] Aaron C. Courville,et al. FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.
[42] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[43] Nan Hua,et al. Universal Sentence Encoder , 2018, ArXiv.
[44] Neil Houlsby,et al. Transfer Learning with Neural AutoML , 2018, NeurIPS.
[45] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[46] Iain Murray,et al. BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning , 2019, ICML.
[47] Quoc V. Le,et al. Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Sina J. Semnani. BERT-A : Fine-tuning BERT with Adapters and Data Augmentation , 2019 .
[49] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[50] Ting Chen,et al. On Self Modulation for Generative Adversarial Networks , 2018, ICLR.
[51] John K. Tsotsos,et al. Incremental Learning Through Deep Adaptation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.