暂无分享,去创建一个
Shiyu Chang | Yang Zhang | Tianlong Chen | Zhangyang Wang | Sijia Liu | Michael Carbin | Jonathan Frankle
[1] Kaiming He,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Philip H. S. Torr,et al. SNIP: Single-shot Network Pruning based on Connection Sensitivity , 2018, ICLR.
[3] Yue Wang,et al. Deep k-Means: Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions , 2018, ICML.
[4] Avirup Sil,et al. Structured Pruning of a BERT-based Question Answering Model , 2019 .
[5] Roger B. Grosse,et al. Picking Winning Tickets Before Training by Preserving Gradient Flow , 2020, ICLR.
[6] Dacheng Tao,et al. Adversarial Learning of Portable Student Networks , 2018, AAAI.
[7] Erich Elsen,et al. The State of Sparsity in Deep Neural Networks , 2019, ArXiv.
[8] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[9] Gintare Karolina Dziugaite,et al. Linear Mode Connectivity and the Lottery Ticket Hypothesis , 2019, ICML.
[10] Yuandong Tian,et al. One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers , 2019, NeurIPS.
[11] Subhabrata Mukherjee,et al. Distilling BERT into Simple Neural Networks with Unlabeled Transfer Data , 2019 .
[12] Yiming Yang,et al. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.
[13] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.
[14] Shrey Desai,et al. Evaluating Lottery Tickets Under Distributional Shifts , 2019, EMNLP.
[15] Qi Tian,et al. Data-Free Learning of Student Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[16] Chao Xu,et al. LegoNet: Efficient Convolutional Neural Networks with Lego Filters , 2019, ICML.
[17] Yu Cheng,et al. Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Rahul Mehta,et al. Sparse Transfer Learning via Winning Lottery Tickets , 2019, ArXiv.
[19] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[20] Andrew McCallum,et al. Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.
[21] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[22] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[23] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[24] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[25] Thomas Wolf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[26] Yin Yang,et al. Compressing Large-Scale Transformer-Based Models: A Case Study on BERT , 2020, Transactions of the Association for Computational Linguistics.
[27] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[28] Yu Cheng,et al. Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.
[29] Huan Wang,et al. MKD: a Multi-Task Knowledge Distillation Approach for Pretrained Language Models , 2019 .
[30] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.
[31] David J. Schwab,et al. The Early Phase of Neural Network Training , 2020, ICLR.
[32] Erich Elsen,et al. Fast Sparse ConvNets , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Luke Zettlemoyer,et al. Sparse Networks from Scratch: Faster Training without Losing Performance , 2019, ArXiv.
[34] Edouard Grave,et al. Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.
[35] Moshe Wasserblat,et al. Q8BERT: Quantized 8Bit BERT , 2019, 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS).
[36] Michael W. Mahoney,et al. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT , 2019, AAAI.
[37] Peter Szolovits,et al. Is BERT Really Robust? Natural Language Attack on Text Classification and Entailment , 2019, ArXiv.
[38] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.
[39] Jimmy J. Lin,et al. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks , 2019, ArXiv.
[40] Michael Carbin,et al. Comparing Rewinding and Fine-tuning in Neural Network Pruning , 2019, ICLR.
[41] Peter Szolovits,et al. Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment , 2020, AAAI.
[42] Anna Rumshisky,et al. When BERT Plays the Lottery, All Tickets Are Winning , 2020, EMNLP.
[43] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[44] Jose Javier Gonzalez Ortiz,et al. What is the State of Neural Network Pruning? , 2020, MLSys.
[45] Erich Elsen,et al. Rigging the Lottery: Making All Tickets Winners , 2020, ICML.
[46] Yuandong Tian,et al. Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP , 2019, ICLR.
[47] Mingjie Sun,et al. Rethinking the Value of Network Pruning , 2018, ICLR.
[48] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[49] Yanzhi Wang,et al. Reweighted Proximal Pruning for Large-Scale Language Representation , 2019, ArXiv.
[50] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[51] Yue Wang,et al. Drawing early-bird tickets: Towards more efficient training of deep networks , 2019, ICLR.
[52] Ahmed Hassan Awadallah,et al. Distilling Transformers into Simple Neural Networks with Unlabeled Transfer Data , 2019, arXiv.org.
[53] Yoshua Bengio,et al. Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.
[54] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.