暂无分享,去创建一个
Songfang Huang | Baobao Chang | Chuanqi Tan | Fei Huang | Zhiyuan Zhang | Fuli Luo | Runxin Xu | Fei Huang | Songfang Huang | Chuanqi Tan | Baobao Chang | Fuli Luo | Runxin Xu | Zhiyuan Zhang
[1] Kilian Q. Weinberger,et al. Revisiting Few-sample BERT Fine-tuning , 2020, ArXiv.
[2] Hossein Mobahi,et al. Sharpness-Aware Minimization for Efficiently Improving Generalization , 2020, ArXiv.
[3] Mona Attariyan,et al. Parameter-Efficient Transfer Learning for NLP , 2019, ICML.
[4] Martin Jaggi,et al. Dynamic Model Pruning with Feedback , 2020, ICLR.
[5] Marius Mosbach,et al. On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines , 2020, ArXiv.
[6] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[7] Hanan Samet,et al. Pruning Filters for Efficient ConvNets , 2016, ICLR.
[8] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.
[9] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[10] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[11] Yu Cao,et al. Ranking the parameters of deep neural networks using the fisher information , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.
[13] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[14] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[15] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[16] Jian Zhang,et al. Natural Language Inference over Interaction Space , 2017, ICLR.
[17] Hal Daumé,et al. Frustratingly Easy Domain Adaptation , 2007, ACL.
[18] Samuel R. Bowman,et al. Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks , 2018, ArXiv.
[19] Xu Sun,et al. Exploring the Vulnerability of Deep Neural Networks: A Study of Parameter Corruption , 2020, ArXiv.
[20] Suyog Gupta,et al. To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.
[21] Peter Clark,et al. SciTaiL: A Textual Entailment Dataset from Science Question Answering , 2018, AAAI.
[22] Marco Marelli,et al. A SICK cure for the evaluation of compositional distributional semantic models , 2014, LREC.
[23] Jianfeng Gao,et al. SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization , 2019, ACL.
[24] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[25] Armen Aghajanyan,et al. Better Fine-Tuning by Reducing Representational Collapse , 2020, ICLR.
[26] Yonatan Belinkov,et al. Variational Information Bottleneck for Effective Low-Resource Fine-Tuning , 2021, ICLR.
[27] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[28] Sebastian Ruder,et al. Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks , 2021, ACL.
[29] Kyunghyun Cho,et al. Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models , 2020, ICLR.
[30] Yu Cao,et al. Reducing the Model Order of Deep Neural Networks Using Information Theory , 2016, 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).
[31] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.
[32] Ali Farhadi,et al. Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping , 2020, ArXiv.
[33] Alexander M. Rush,et al. Parameter-Efficient Transfer Learning with Diff Pruning , 2021, ACL/IJCNLP.
[34] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[35] Iryna Gurevych,et al. MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer , 2020, EMNLP.
[36] Wanxiang Che,et al. Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting , 2020, EMNLP.
[37] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[38] Massimiliano Pontil,et al. Distance-Based Regularisation of Deep Networks for Fine-Tuning , 2021, ICLR.