暂无分享,去创建一个
Lidong Bing | Hai Ye | Qingyu Tan | Ruidan He | Jia-Wei Low | Luo Si | Liying Cheng | Linlin Liu | Bosheng Ding | Lidong Bing | Ruidan He | Bosheng Ding | Liying Cheng | Luo Si | Qingyu Tan | Linlin Liu | Jia-Wei Low | Hai Ye
[1] Armen Aghajanyan,et al. Better Fine-Tuning by Reducing Representational Collapse , 2020, ICLR.
[2] Kilian Q. Weinberger,et al. Revisiting Few-sample BERT Fine-tuning , 2020, ArXiv.
[3] Timothy Baldwin,et al. Does an LSTM forget more than a CNN? An empirical study of catastrophic forgetting in NLP , 2019, ALTA.
[4] Mikel Artetxe,et al. On the Cross-lingual Transferability of Monolingual Representations , 2019, ACL.
[5] Ankur Bapna,et al. Simple, Scalable Adaptation for Neural Machine Translation , 2019, EMNLP.
[6] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .
[7] Thomas Wolf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[8] Willem Zuidema,et al. Blackbox Meets Blackbox: Representational Similarity & Stability Analysis of Neural Language Models and Brains , 2019, BlackboxNLP@ACL.
[9] Mona Attariyan,et al. Parameter-Efficient Transfer Learning for NLP , 2019, ICML.
[10] Furu Wei,et al. Visualizing and Understanding the Effectiveness of BERT , 2019, EMNLP.
[11] Enhong Chen,et al. Incorporating BERT into Parallel Sequence Decoding with Adapters , 2020, NeurIPS.
[12] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[13] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.
[14] Elahe Rahimtoroghi,et al. What Happens To BERT Embeddings During Fine-tuning? , 2020, BLACKBOXNLP.
[15] Xuanjing Huang,et al. How to Fine-Tune BERT for Text Classification? , 2019, CCL.
[16] Wanxiang Che,et al. Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting , 2020, EMNLP.
[17] Kyunghyun Cho,et al. Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models , 2020, ICLR.
[18] Iryna Gurevych,et al. AdapterHub: A Framework for Adapting Transformers , 2020, EMNLP.
[19] Garrison W. Cottrell,et al. Content and cluster analysis: Assessing representational similarity in neural systems , 2000 .
[20] Rui Yan,et al. How Transferable are Neural Networks in NLP Applications? , 2016, EMNLP.
[21] Doug Downey,et al. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.
[22] Graham Neubig,et al. XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.
[23] Yoshua Bengio,et al. An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.
[24] Jianfeng Gao,et al. SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization , 2019, ACL.
[25] Xuanjing Huang,et al. K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters , 2020, FINDINGS.
[26] Iryna Gurevych,et al. AdapterFusion: Non-Destructive Task Composition for Transfer Learning , 2021, EACL.
[27] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[28] Iain Murray,et al. BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning , 2019, ICML.
[29] Heng Ji,et al. Cross-lingual Name Tagging and Linking for 282 Languages , 2017, ACL.
[30] Yu Cheng,et al. FreeLB: Enhanced Adversarial Training for Natural Language Understanding , 2020, ICLR.
[31] Ali Farhadi,et al. Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping , 2020, ArXiv.
[32] Veselin Stoyanov,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.
[33] Yuji Matsumoto,et al. Universal Dependencies 2.1 , 2017 .
[34] Iryna Gurevych,et al. AdapterDrop: On the Efficiency of Adapters in Transformers , 2020, EMNLP.
[35] Iryna Gurevych,et al. MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer , 2020, EMNLP.
[36] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[37] Samuel R. Bowman,et al. Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks , 2018, ArXiv.
[38] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.
[39] R. French. Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.
[40] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[41] Marius Mosbach,et al. On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines , 2020, ArXiv.
[42] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[43] Afra Alishahi,et al. Correlating Neural and Symbolic Representations of Language , 2019, ACL.
[44] Hao Li,et al. Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.