Parameter-Efficient Transfer Learning with Diff Pruning
暂无分享,去创建一个
[1] Martin Jaggi,et al. Masking as an Efficient Alternative to Finetuning for Pretrained Language Models , 2020, EMNLP.
[2] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[3] Xin Wang,et al. How fine can fine-tuning be? Learning efficient language models , 2020, AISTATS.
[4] Alexander M. Rush,et al. Low-Complexity Probing via Finding Subnetworks , 2021, NAACL.
[5] Quoc V. Le,et al. Semi-supervised Sequence Learning , 2015, NIPS.
[6] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[7] Kevin Duh,et al. Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning , 2020, RepL4NLP@ACL.
[8] Yonatan Belinkov,et al. Similarity Analysis of Contextual Word Representation Models , 2020, ACL.
[9] Iryna Gurevych,et al. MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer , 2020, EMNLP.
[10] Byoung-Tak Zhang,et al. Overcoming Catastrophic Forgetting by Incremental Moment Matching , 2017, NIPS.
[11] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[12] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[13] Thomas Wolf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[14] Olatunji Ruwase,et al. ZeRO: Memory Optimization Towards Training A Trillion Parameter Models , 2019, SC.
[15] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[16] Nan Hua,et al. Universal Sentence Encoder for English , 2018, EMNLP.
[17] OctoMiao. Overcoming catastrophic forgetting in neural networks , 2016 .
[18] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[19] Sanja Fidler,et al. Skip-Thought Vectors , 2015, NIPS.
[20] Shiyu Chang,et al. The Lottery Ticket Hypothesis for Pre-trained BERT Networks , 2020, NeurIPS.
[21] Felix Hill,et al. Learning Distributed Representations of Sentences from Unlabelled Data , 2016, NAACL.
[22] Mariana L. Neves,et al. Neural Domain Adaptation for Biomedical Question Answering , 2017, CoNLL.
[23] Holger Schwenk,et al. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.
[24] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.
[25] Iryna Gurevych,et al. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.
[26] Yu Cheng,et al. Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.
[27] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.
[28] Dipanjan Das,et al. BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.
[29] Guanghui Qin,et al. Learning How to Ask: Querying LMs with Mixtures of Soft Prompts , 2021, NAACL.
[30] Preslav Nakov,et al. Poor Man's BERT: Smaller and Faster Transformer Models , 2020, ArXiv.
[31] Mona Attariyan,et al. Parameter-Efficient Transfer Learning for NLP , 2019, ICML.
[32] Christopher Joseph Pal,et al. Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning , 2018, ICLR.
[33] Quoc V. Le,et al. BAM! Born-Again Multi-Task Networks for Natural Language Understanding , 2019, ACL.
[34] Iryna Gurevych,et al. AdapterHub: A Framework for Adapting Transformers , 2020, EMNLP.
[35] Alexander M. Rush,et al. Movement Pruning: Adaptive Sparsity by Fine-Tuning , 2020, NeurIPS.
[36] Xiaodong Liu,et al. Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.
[37] Yee Whye Teh,et al. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.
[38] Stefan Wermter,et al. Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.
[39] Percy Liang,et al. Prefix-Tuning: Optimizing Continuous Prompts for Generation , 2021, ACL.
[40] Marc'Aurelio Ranzato,et al. Gradient Episodic Memory for Continual Learning , 2017, NIPS.
[41] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[42] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[43] Mohammad Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[44] Max Welling,et al. Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.
[45] Sanjeev Arora,et al. A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.
[46] Iryna Gurevych,et al. AdapterFusion: Non-Destructive Task Composition for Transfer Learning , 2021, EACL.
[47] Svetlana Lazebnik,et al. Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights , 2018, ECCV.
[48] Iain Murray,et al. BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning , 2019, ICML.
[49] Kwan Hui Lim,et al. An Unsupervised Sentence Embedding Method by Mutual Information Maximization , 2020, EMNLP.
[50] Ziheng Wang,et al. Structured Pruning of Large Language Models , 2020, EMNLP.
[51] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[52] Jiwon Kim,et al. Continual Learning with Deep Generative Replay , 2017, NIPS.
[53] Quoc V. Le,et al. Distributed Representations of Sentences and Documents , 2014, ICML.
[54] Kevin Gimpel,et al. Towards Universal Paraphrastic Sentence Embeddings , 2015, ICLR.
[55] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.
[56] Naveen Arivazhagan,et al. Language-agnostic BERT Sentence Embedding , 2020, ArXiv.
[57] Yee Whye Teh,et al. Progress & Compress: A scalable framework for continual learning , 2018, ICML.
[58] Yiming Yang,et al. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.
[59] Hinrich Schutze,et al. It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners , 2020, NAACL.
[60] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.
[61] Richard Socher,et al. Learned in Translation: Contextualized Word Vectors , 2017, NIPS.
[62] R. French. Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.
[63] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[64] Rameswar Panda,et al. AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning , 2020, NeurIPS.
[65] Wei Li,et al. Learning Universal Sentence Representations with Mean-Max Attention Autoencoder , 2018, EMNLP.
[66] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.
[67] Rico Sennrich,et al. Regularization techniques for fine-tuning in neural machine translation , 2017, EMNLP.
[68] Yin Yang,et al. Compressing Large-Scale Transformer-Based Models: A Case Study on BERT , 2020, Transactions of the Association for Computational Linguistics.
[69] Andrea Vedaldi,et al. Efficient Parametrization of Multi-domain Deep Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[70] Rogério Schmidt Feris,et al. SpotTune: Transfer Learning Through Adaptive Fine-Tuning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[71] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[72] Leonidas Guibas,et al. Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks , 2019, ECCV.
[73] Alexei A. Efros,et al. Dataset Distillation , 2018, ArXiv.
[74] Ming-Wei Chang,et al. Well-Read Students Learn Better: On the Importance of Pre-training Compact Models , 2019 .
[75] Olatunji Ruwase,et al. ZeRO: Memory optimizations Toward Training Trillion Parameter Models , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[76] Yonatan Belinkov,et al. Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.
[77] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[78] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[79] Kyunghyun Cho,et al. Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models , 2020, ICLR.
[80] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.