Task-Specific Skill Localization in Fine-tuned Language Models
暂无分享,去创建一个
[1] Mohit Bansal,et al. Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models , 2023, ArXiv.
[2] D. Klein,et al. Discovering Latent Knowledge in Language Models Without Supervision , 2022, ICLR.
[3] Sébastien Bubeck,et al. How to Fine-Tune Vision Models with SGD , 2022, arXiv.org.
[4] Zhengyan Zhang,et al. Finding Skill Neurons in Pre-trained Transformer-based Language Models , 2022, EMNLP.
[5] Annie S. Chen,et al. Surgical Fine-Tuning Improves Adaptation to Distribution Shifts , 2022, ICLR.
[6] Sashank J. Reddi,et al. Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers , 2022, ArXiv.
[7] Sanjeev Arora,et al. A Kernel-Based View of Language Model Fine-Tuning , 2022, ICML.
[8] Jie Zhou,et al. A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models , 2022, NeurIPS.
[9] David Bau,et al. Locating and Editing Factual Associations in GPT , 2022, NeurIPS.
[10] Jong Wook Kim,et al. Robust fine-tuning of zero-shot models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Li Dong,et al. Knowledge Neurons in Pretrained Transformers , 2021, ACL.
[12] Tie-Yan Liu,et al. Finding the Dominant Winning Ticket in Pre-Trained Language Models , 2022, FINDINGS.
[13] Sung Ju Hwang,et al. Forget-free Continual Learning with Winning Subnetworks , 2022, ICML.
[14] Sang Michael Xie,et al. Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning , 2021, NeurIPS.
[15] B. Kailkhura,et al. A Winning Hand: Compressing Deep Networks Can Improve Out-Of-Distribution Robustness , 2021, NeurIPS.
[16] Boaz Barak,et al. Revisiting Model Stitching to Compare Neural Representations , 2021, NeurIPS.
[17] Aaron C. Courville,et al. Can Subnetwork Structure be the Key to Out-of-Distribution Generalization? , 2021, ICML.
[18] Wei Wang,et al. Adapting by Pruning: A Case Study on BERT , 2021, ArXiv.
[19] Luke Zettlemoyer,et al. Surface Form Competition: Why the Highest Probability Answer Isn’t Always Right , 2021, EMNLP.
[20] Danqi Chen,et al. Making Pre-trained Language Models Better Few-shot Learners , 2021, ACL.
[21] Sanjeev Arora,et al. A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks , 2020, ICLR.
[22] Percy Liang,et al. Prefix-Tuning: Optimizing Continuous Prompts for Generation , 2021, ACL.
[23] Yang Zhang,et al. The Lottery Ticket Hypothesis for Pre-trained BERT Networks , 2020, NeurIPS.
[24] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[25] Anna Rumshisky,et al. When BERT Plays the Lottery, All Tickets Are Winning , 2020, EMNLP.
[26] Dawn Song,et al. Pretrained Transformers Improve Out-of-Distribution Robustness , 2020, ACL.
[27] Mitchell A. Gordon,et al. Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning , 2020, REPL4NLP.
[28] S. Levine,et al. Gradient Surgery for Multi-Task Learning , 2020, NeurIPS.
[29] Daniel M. Roy,et al. In Defense of Uniform Convergence: Generalization via derandomization with an application to interpolating predictors , 2019, ICML.
[30] Percy Liang,et al. Verified Uncertainty Calibration , 2019, NeurIPS.
[31] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[32] Fred Zhang,et al. SGD on Neural Networks Learns Functions of Increasing Complexity , 2019, NeurIPS.
[33] J. Zico Kolter,et al. Uniform convergence may be unable to explain generalization in deep learning , 2019, NeurIPS.
[34] Mona Attariyan,et al. Parameter-Efficient Transfer Learning for NLP , 2019, ICML.
[35] Xiaodong Liu,et al. Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.
[36] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.
[37] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[38] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[39] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[40] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[41] Yi Zhang,et al. Stronger generalization bounds for deep nets via a compression approach , 2018, ICML.
[42] Kilian Q. Weinberger,et al. On Calibration of Modern Neural Networks , 2017, ICML.
[43] Yu Cheng,et al. Fully-Adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Philip S. Yu,et al. Learning Multiple Tasks with Multilinear Relationship Networks , 2015, NIPS.
[45] Hinrich Schütze,et al. Table Filling Multi-Task Recurrent Neural Network for Joint Entity and Relation Extraction , 2016, COLING.
[46] Bing Liu,et al. Lifelong machine learning: a paradigm for continuous learning , 2017, Frontiers of Computer Science.
[47] Xuanjing Huang,et al. Recurrent Neural Network for Text Classification with Multi-Task Learning , 2016, IJCAI.
[48] Milos Hauskrecht,et al. Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.
[49] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.