暂无分享,去创建一个
[1] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[2] Ming-Wei Chang,et al. Well-Read Students Learn Better: The Impact of Student Initialization on Knowledge Distillation , 2019, ArXiv.
[3] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.
[4] Ji Liu,et al. Global Sparse Momentum SGD for Pruning Very Deep Neural Networks , 2019, NeurIPS.
[5] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[6] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[7] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.
[8] Preslav Nakov,et al. Poor Man's BERT: Smaller and Faster Transformer Models , 2020, ArXiv.
[9] Yuandong Tian,et al. Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP , 2019, ICLR.
[10] Ali Farhadi,et al. What’s Hidden in a Randomly Weighted Neural Network? , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[11] J. Scott McCarley,et al. Pruning a BERT-based Question Answering Model , 2019, ArXiv.
[12] Yanzhi Wang,et al. Reweighted Proximal Pruning for Large-Scale Language Representation , 2019, ArXiv.
[13] Lucas Theis,et al. Faster gaze prediction with dense networks and Fisher pruning , 2018, ArXiv.
[14] Jimmy J. Lin,et al. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks , 2019, ArXiv.
[15] Gintare Karolina Dziugaite,et al. Linear Mode Connectivity and the Lottery Ticket Hypothesis , 2019, ICML.
[16] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[17] Julien Mairal,et al. Structured sparsity through convex optimization , 2011, ArXiv.
[18] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[19] Anupam Datta,et al. Gender Bias in Neural Natural Language Processing , 2018, Logic, Language, and Security.
[20] Erich Elsen,et al. The State of Sparsity in Deep Neural Networks , 2019, ArXiv.
[21] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.
[22] Sameer Singh,et al. Universal Adversarial Triggers for Attacking and Analyzing NLP , 2019, EMNLP.
[23] Max Welling,et al. Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.
[24] Yurong Chen,et al. Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.
[25] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[26] Ziheng Wang,et al. Structured Pruning of Large Language Models , 2019, EMNLP.
[27] Gintare Karolina Dziugaite,et al. The Lottery Ticket Hypothesis at Scale , 2019, ArXiv.
[28] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[29] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[30] Moshe Wasserblat,et al. Q8BERT: Quantized 8Bit BERT , 2019, 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS).
[31] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[32] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[33] Edouard Grave,et al. Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.
[34] Svetlana Lazebnik,et al. Piggyback: Adding Multiple Tasks to a Single, Fixed Network by Learning to Mask , 2018, ArXiv.
[35] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[36] Kevin Duh,et al. Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning , 2020, RepL4NLP@ACL.
[37] Suyog Gupta,et al. To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.
[38] Edouard Grave,et al. Training with Quantization Noise for Extreme Model Compression , 2020, ICLR.
[39] Rich Caruana,et al. Model compression , 2006, KDD '06.
[40] Gregory J. Wolff,et al. Optimal Brain Surgeon: Extensions and performance comparisons , 1993, NIPS 1993.
[41] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[42] Song Han,et al. AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.
[43] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.
[44] Yonatan Belinkov,et al. Causal Mediation Analysis for Interpreting Neural NLP: The Case of Gender Bias , 2020, ArXiv.
[45] Mark Horowitz,et al. 1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).
[46] Ming Yang,et al. Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.
[47] Luke Zettlemoyer,et al. Sparse Networks from Scratch: Faster Training without Losing Performance , 2019, ArXiv.
[48] Dan Klein,et al. Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers , 2020, ArXiv.
[49] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[50] Thomas Wolf,et al. Transfer Learning in Natural Language Processing , 2019, NAACL.
[51] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[52] Stephen P. Boyd,et al. Proximal Algorithms , 2013, Found. Trends Optim..
[53] Philip H. S. Torr,et al. SNIP: Single-shot Network Pruning based on Connection Sensitivity , 2018, ICLR.
[54] Andrew McCallum,et al. Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.