High Performance Natural Language Processing
暂无分享,去创建一个
[1] Ming-Wei Chang,et al. Well-Read Students Learn Better: The Impact of Student Initialization on Knowledge Distillation , 2019, ArXiv.
[2] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[3] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[4] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.
[5] Nikolaos Pappas,et al. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , 2020, ICML.
[6] Hannaneh Hajishirzi,et al. DeLighT: Very Deep and Light-weight Transformer , 2020, ArXiv.
[7] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[8] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[9] Omer Levy,et al. Generalization through Memorization: Nearest Neighbor Language Models , 2020, ICLR.
[10] Song Han,et al. MicroNet for Efficient Language Modeling , 2020, NeurIPS.
[11] Orhan Firat,et al. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding , 2020, ICLR.
[12] Franccois Fleuret,et al. Fast Transformers with Clustered Attention , 2020, NeurIPS.
[13] Quoc V. Le,et al. The Evolved Transformer , 2019, ICML.
[14] Boaz Barak,et al. Deep double descent: where bigger models and more data hurt , 2019, ICLR.
[15] Moshe Wasserblat,et al. Q8BERT: Quantized 8Bit BERT , 2019, 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS).
[16] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.
[17] Hermann Ney,et al. Successfully Applying the Stabilized Lottery Ticket Hypothesis to the Transformer Architecture , 2020, ACL.
[18] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[19] Song Han,et al. HAT: Hardware-Aware Transformers for Efficient Natural Language Processing , 2020, ACL.
[20] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[21] Kurt Keutzer,et al. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT , 2020, AAAI.
[22] Han Fang,et al. Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.
[23] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.
[24] André F. T. Martins,et al. Adaptively Sparse Transformers , 2019, EMNLP.
[25] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[26] Alexander M. Rush,et al. Movement Pruning: Adaptive Sparsity by Fine-Tuning , 2020, NeurIPS.
[27] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[28] Alex Krizhevsky,et al. One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.
[29] Forrest N. Iandola,et al. SqueezeBERT: What can computer vision teach NLP about efficient neural networks? , 2020, SUSTAINLP.
[30] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.
[31] Ji Ma,et al. Natural Language Processing with Small Feed-Forward Networks , 2017, EMNLP.
[32] Yiming Yang,et al. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.
[33] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[34] Christopher Ré,et al. Low-Memory Neural Network Training: A Technical Report , 2019, ArXiv.
[35] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[36] Li Yang,et al. Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.
[37] Iain Murray,et al. BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning , 2019, ICML.
[38] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[39] Ming-Wei Chang,et al. REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.