Lifelong Language Pretraining with Distribution-Specialized Experts
暂无分享,去创建一个
Z. Chen | J. Laudon | Yanping Huang | Nan Du | Wuyang Chen | Yan-Quan Zhou | Claire Cu
[1] Sida I. Wang,et al. On Continual Model Refinement in Out-of-Distribution Data Streams , 2022, ACL.
[2] Quoc V. Le,et al. GLaM: Efficient Scaling of Language Models with Mixture-of-Experts , 2021, ICML.
[3] Ankur Bapna,et al. Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference , 2021, EMNLP.
[4] Noah A. Smith,et al. DEMix Layers: Disentangling Domains for Modular Language Modeling , 2021, NAACL.
[5] Stefan Wermter,et al. DRILL: Dynamic Representations for Imbalanced Lifelong Learning , 2021, ICANN.
[6] Bill Yuchen Lin,et al. Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning , 2021, EMNLP.
[7] Xuezhi Wang,et al. Continual Learning for Text Classification with Information Disentanglement Based Regularization , 2021, NAACL.
[8] Noam M. Shazeer,et al. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, J. Mach. Learn. Res..
[9] Magdalena Biesialska,et al. Continual Lifelong Learning in Natural Language Processing: A Survey , 2020, COLING.
[10] Ekaterina Shutova,et al. Meta-Learning with Sparse Experience Replay for Lifelong Language Learning , 2020, ArXiv.
[11] Orhan Firat,et al. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding , 2020, ICLR.
[12] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[13] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.
[14] Dustin Tran,et al. BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning , 2020, ICLR.
[15] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[16] Quoc V. Le,et al. Towards a Human-like Open-Domain Chatbot , 2020, ArXiv.
[17] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[18] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[19] Hung-yi Lee,et al. LAMOL: LAnguage MOdeling for Lifelong Language Learning , 2019, ICLR.
[20] Tinne Tuytelaars,et al. Online Continual Learning with Maximally Interfered Retrieval , 2019, ArXiv.
[21] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[22] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[23] Sebastian Ruder,et al. Episodic Memory in Lifelong Language Learning , 2019, NeurIPS.
[24] Philip S. Yu,et al. BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis , 2019, NAACL.
[25] Hong Wang,et al. Sentence Embedding Alignment for Lifelong Relation Extraction , 2019, NAACL.
[26] Mona Attariyan,et al. Parameter-Efficient Transfer Learning for NLP , 2019, ICML.
[27] David Rolnick,et al. Experience Replay for Continual Learning , 2018, NeurIPS.
[28] Quoc V. Le,et al. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism , 2018, NeurIPS.
[29] Dustin Tran,et al. Mesh-TensorFlow: Deep Learning for Supercomputers , 2018, NeurIPS.
[30] Xiaodong Liu,et al. ReCoRD: Bridging the Gap between Human and Machine Commonsense Reading Comprehension , 2018, ArXiv.
[31] Marc'Aurelio Ranzato,et al. Efficient Lifelong Learning with A-GEM , 2018, ICLR.
[32] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[33] Noam Shazeer,et al. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.
[34] Svetlana Lazebnik,et al. PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[35] Sung Ju Hwang,et al. Lifelong Learning with Dynamically Expandable Networks , 2017, ICLR.
[36] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[37] Marc'Aurelio Ranzato,et al. Gradient Episodic Memory for Continual Learning , 2017, NIPS.
[38] Jiwon Kim,et al. Continual Learning with Deep Generative Replay , 2017, NIPS.
[39] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.
[40] Andrei A. Rusu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.
[41] Christoph H. Lampert,et al. iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Karin M. Verspoor,et al. Findings of the 2016 Conference on Machine Translation , 2016, WMT.
[43] Derek Hoiem,et al. Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[44] Tianqi Chen,et al. Net2Net: Accelerating Learning via Knowledge Transfer , 2015, ICLR.
[45] Quoc V. Le,et al. Semi-supervised Sequence Learning , 2015, NIPS.
[46] Sanja Fidler,et al. Skip-Thought Vectors , 2015, NIPS.
[47] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[48] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[49] Honglak Lee,et al. Online Incremental Feature Learning with Denoising Autoencoders , 2012, AISTATS.
[50] Geoffrey E. Hinton,et al. Generating Text with Recurrent Neural Networks , 2011, ICML.
[51] Zornitsa Kozareva,et al. SemEval-2012 Task 7: Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning , 2011, *SEMEVAL.
[52] Anthony V. Robins,et al. Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..
[53] Aman Hussain,et al. Towards a robust experimental framework and benchmark for lifelong language learning , 2021, NeurIPS Datasets and Benchmarks.
[54] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[55] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[56] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[57] Danqi Chen,et al. of the Association for Computational Linguistics: , 2001 .