The Effect of Masking Strategies on Knowledge Retention by Language Models
暂无分享,去创建一个
[1] Stanley Jungkyu Choi,et al. Towards Continual Knowledge Learning of Language Models , 2021, ICLR.
[2] Abhishek Aich,et al. Elastic Weight Consolidation (EWC): Nuts and Bolts , 2021, ArXiv.
[3] Li Dong,et al. Knowledge Neurons in Pretrained Transformers , 2021, ACL.
[4] Madian Khabsa,et al. On the Influence of Masking Policies in Intermediate Pre-training , 2021, EMNLP.
[5] Yuxiang Wu,et al. PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them , 2021, Transactions of the Association for Computational Linguistics.
[6] Omer Levy,et al. Transformer Feed-Forward Layers Are Key-Value Memories , 2020, EMNLP.
[7] Yang Feng,et al. Investigating Catastrophic Forgetting During Continual Training for Neural Machine Translation , 2020, COLING.
[8] Avishek Anand,et al. BERTnesia: Investigating the capture and forgetting of knowledge in BERT , 2020, BLACKBOXNLP.
[9] Moshe Tennenholtz,et al. PMI-Masking: Principled masking of correlated spans , 2020, ICLR.
[10] Nicola De Cao,et al. KILT: a Benchmark for Knowledge Intensive Language Tasks , 2020, NAACL.
[11] Maksym Andriushchenko,et al. On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines , 2020, ICLR.
[12] Fabio Petroni,et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , 2020, NeurIPS.
[13] Noah D. Goodman,et al. Investigating Transferability in Pretrained Language Models , 2020, FINDINGS.
[14] Ming-Wei Chang,et al. REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.
[15] Colin Raffel,et al. How Much Knowledge Can You Pack into the Parameters of a Language Model? , 2020, EMNLP.
[16] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[17] Francisco S. Melo,et al. Multi-task Learning and Catastrophic Forgetting in Continual Reinforcement Learning , 2019, GCAI.
[18] Chengsheng Mao,et al. KG-BERT: BERT for Knowledge Graph Completion , 2019, ArXiv.
[19] Sebastian Riedel,et al. Language Models as Knowledge Bases? , 2019, EMNLP.
[20] Ming-Wei Chang,et al. Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.
[21] Omer Levy,et al. SpanBERT: Improving Pre-training by Representing and Predicting Spans , 2019, TACL.
[22] Maosong Sun,et al. ERNIE: Enhanced Language Representation with Informative Entities , 2019, ACL.
[23] Dipanjan Das,et al. BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.
[24] Yonatan Belinkov,et al. Analysis Methods in Neural Language Processing: A Survey , 2018, TACL.
[25] Andrei A. Rusu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.
[26] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[27] James Martens,et al. New Insights and Perspectives on the Natural Gradient Method , 2014, J. Mach. Learn. Res..
[28] Yonatan Belinkov,et al. Probing Classifiers: Promises, Shortcomings, and Alternatives , 2021, ArXiv.
[29] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[30] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .