Large Product Key Memory for Pretrained Language Models
暂无分享,去创建一个
[1] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[2] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[3] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[4] Ke Xu,et al. Scheduled DropHead: A Regularization Method for Transformer Models , 2020, EMNLP.
[5] Jung-Woo Ha,et al. NSML: Meet the MLaaS platform with a real-world case study , 2018, ArXiv.
[6] Jung-Woo Ha,et al. NSML: A Machine Learning Platform That Enables You to Focus on Your Models , 2017, ArXiv.
[7] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Ming-Wei Chang,et al. REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.
[9] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[10] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[11] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[12] Jason Weston,et al. Memory Networks , 2014, ICLR.
[13] William W. Cohen,et al. Facts as Experts: Adaptable and Interpretable Neural Memory over Symbolic Knowledge , 2020, ArXiv.
[14] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[15] Guillaume Lample,et al. Augmenting Self-attention with Persistent Memory , 2019, ArXiv.
[16] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.
[17] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[18] Eunsol Choi,et al. Entities as Experts: Sparse Memory Access with Entity Supervision , 2020, EMNLP.
[19] Pascal Vincent,et al. Hierarchical Memory Networks , 2016, ArXiv.
[20] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[21] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[22] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[23] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[24] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[25] Guillaume Lample,et al. Large Memory Layers with Product Keys , 2019, NeurIPS.
[26] Mohammad Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[27] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[28] Samuel R. Bowman,et al. Neural Network Acceptability Judgments , 2018, Transactions of the Association for Computational Linguistics.
[29] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[30] Omer Levy,et al. Generalization through Memorization: Nearest Neighbor Language Models , 2020, ICLR.
[31] Timothy P. Lillicrap,et al. Compressive Transformers for Long-Range Sequence Modelling , 2019, ICLR.
[32] Jason Weston,et al. End-To-End Memory Networks , 2015, NIPS.
[33] Alex Graves,et al. Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes , 2016, NIPS.