Mind the Gap: Assessing Temporal Generalization in Neural Language Models
暂无分享,去创建一个
Phil Blunsom | Dani Yogatama | Angeliki Lazaridou | Kris Cao | Elena Gribovskaya | Adhiguna Kuncoro | Adam Liska | Cyprien de Masson d'Autume | Devang Agrawal | Susannah Young | Tomas Kocisky | Sebastian Ruder | Tayfun Terzi | Mai Gimenez | Dani Yogatama | Angeliki Lazaridou | P. Blunsom | Tomás Kociský | A. Kuncoro | E. Gribovskaya | Adam Liska | Kris Cao | Sebastian Ruder | Tayfun Terzi | Devang Agrawal | Susannah Young | Mai Gimenez | Phil Blunsom
[1] Ankit Singh Rawat,et al. Modifying Memories in Transformer Models , 2020, ArXiv.
[2] Thorsten Brants,et al. One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.
[3] Omer Levy,et al. Generalization through Memorization: Nearest Neighbor Language Models , 2020, ICLR.
[4] Hal Daumé,et al. Frustratingly Easy Domain Adaptation , 2007, ACL.
[5] Yoav Goldberg,et al. BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models , 2021, ACL.
[6] Shruti Rijhwani,et al. Temporally-Informed Analysis of Named Entity Recognition , 2020, ACL.
[7] Danqi Chen,et al. Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.
[8] Noam Shazeer,et al. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, ArXiv.
[9] Chen Liang,et al. Carbon Emissions and Large Neural Network Training , 2021, ArXiv.
[10] Emily M. Bender,et al. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 , 2021, FAccT.
[11] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[12] Dawn Song,et al. Pretrained Transformers Improve Out-of-Distribution Robustness , 2020, ACL.
[13] Vicente Ordonez,et al. Bias and Fairness in Natural Language Processing , 2019, EMNLP/IJCNLP.
[14] Dan Klein,et al. Cross-Domain Generalization of Neural Constituency Parsers , 2019, ACL.
[15] Pieter Abbeel,et al. Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.
[16] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[17] Fabio Petroni,et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , 2020, NeurIPS.
[18] Ali Farhadi,et al. Defending Against Neural Fake News , 2019, NeurIPS.
[19] Steve Renals,et al. Dynamic Evaluation of Transformer Language Models , 2019, ArXiv.
[20] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..
[21] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[22] Xinlei Chen,et al. Never-Ending Learning , 2012, ECAI.
[23] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[24] Sebastian Ruder,et al. Episodic Memory in Lifelong Language Learning , 2019, NeurIPS.
[25] Mohit Bansal,et al. Adversarial NLI: A New Benchmark for Natural Language Understanding , 2020, ACL.
[26] OctoMiao. Overcoming catastrophic forgetting in neural networks , 2016 .
[27] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[28] John Blitzer,et al. Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.
[29] Ming-Wei Chang,et al. Retrieval Augmented Language Model Pre-Training , 2020, ICML.
[30] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[31] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[32] Ashwin Lall,et al. Exponential Reservoir Sampling for Streaming Language Models , 2014, ACL.
[33] Andrew McCallum,et al. Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.
[34] Andrei A. Rusu,et al. Embracing Change: Continual Learning in Deep Neural Networks , 2020, Trends in Cognitive Sciences.
[35] A. Bifet,et al. Early Drift Detection Method , 2005 .
[36] Ryan Cotterell,et al. What Kind of Language Is Hard to Language-Model? , 2019, ACL.
[37] Chong Wang,et al. Dynamic Language Models for Streaming Text , 2014, TACL.
[38] Anders Søgaard,et al. Sentiment analysis under temporal shift , 2018, WASSA@EMNLP.
[39] Fan-Keng Sun,et al. LAMOL: LAnguage MOdeling for Lifelong Language Learning , 2020, ICLR.
[40] Katrin Erk,et al. Deep Neural Models of Semantic Shift , 2018, NAACL-HLT.
[41] Navdeep Jaitly,et al. Pointer Networks , 2015, NIPS.
[42] Dani Yogatama,et al. Adaptive Semiparametric Language Models , 2021, Transactions of the Association for Computational Linguistics.
[43] Suresh Venkatasubramanian,et al. Streaming for large scale NLP: Language Modeling , 2009, NAACL.
[44] Steve Renals,et al. Dynamic Evaluation of Neural Sequence Models , 2017, ICML.
[45] Kyunghyun Cho,et al. SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine , 2017, ArXiv.
[46] Percy Liang,et al. Distributionally Robust Language Modeling , 2019, EMNLP.
[47] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .
[48] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[49] Guangquan Zhang,et al. Learning under Concept Drift: A Review , 2019, IEEE Transactions on Knowledge and Data Engineering.
[50] Yue Zhang,et al. Deep Learning for Event-Driven Stock Prediction , 2015, IJCAI.
[51] Marc'Aurelio Ranzato,et al. Gradient Episodic Memory for Continual Learning , 2017, NIPS.
[52] Gerhard Widmer,et al. Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.
[53] Isabelle Augenstein,et al. Back to the Future - Temporal Adaptation of Text Representations , 2020, AAAI.
[54] Eunsol Choi,et al. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.
[55] Dirk Hovy,et al. Crowdsourcing and annotating NER for Twitter #drift , 2014, LREC.
[56] Sebastian Thrun,et al. Lifelong robot learning , 1993, Robotics Auton. Syst..
[57] Zi Yin. The Global Anchor Method for Quantifying Linguistic Shifts and Domain Adaptation , 2018 .
[58] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.
[59] Andreas Vlachos,et al. Automated Fact Checking: Task Formulations, Methods and Future Directions , 2018, COLING.
[60] Sebastian Riedel,et al. Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets , 2020, EACL.
[61] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[62] Jure Leskovec,et al. Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change , 2016, ACL.
[63] Philip Bachman,et al. NewsQA: A Machine Comprehension Dataset , 2016, Rep4NLP@ACL.
[64] Terrence Szymanski,et al. Temporal Word Analogies: Identifying Lexical Replacement with Diachronic Word Embeddings , 2017, ACL.
[65] André F. T. Martins,et al. Adaptively Sparse Transformers , 2019, EMNLP.
[66] R. French. Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.
[67] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[68] Jianfeng Gao,et al. Domain Adaptation via Pseudo In-Domain Data Selection , 2011, EMNLP.
[69] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[70] Christian Hansen,et al. MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims , 2019, EMNLP.
[71] Chris Callison-Burch,et al. Stream-based Translation Models for Statistical Machine Translation , 2010, NAACL.
[72] Anders Sogaard,et al. We Need To Talk About Random Splits , 2020, EACL.
[73] Christopher Potts,et al. DynaSent: A Dynamic Benchmark for Sentiment Analysis , 2020, ACL.
[74] Charles Foster,et al. The Pile: An 800GB Dataset of Diverse Text for Language Modeling , 2020, ArXiv.
[75] Doug Downey,et al. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.
[76] Sandro Pezzelle,et al. The LAMBADA dataset: Word prediction requiring a broad discourse context , 2016, ACL.
[77] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[78] Artem Babenko,et al. Editable Neural Networks , 2020, ICLR.
[79] Nicola De Cao,et al. Editing Factual Knowledge in Language Models , 2021, EMNLP.
[80] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[81] Bernard Mérialdo,et al. A Dynamic Language Model for Speech Recognition , 1991, HLT.
[82] Chong Wang,et al. Continuous Time Dynamic Topic Models , 2008, UAI.
[83] Andrew Zisserman,et al. Video Action Transformer Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[84] Ming-Wei Chang,et al. Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.
[85] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[86] Shai Ben-David,et al. Detecting Change in Data Streams , 2004, VLDB.
[87] Isabelle Augenstein,et al. Back to the Future - Sequential Alignment of Text Representations , 2019, ArXiv.