EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets
暂无分享,去创建一个
[1] Zhe Gan,et al. Playing Lottery Tickets with Vision and Language , 2021, AAAI.
[2] Shiyu Chang,et al. The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Yu Cheng,et al. Contrastive Distillation on Intermediate Representations for Language Model Compression , 2020, EMNLP.
[4] Yang Zhang,et al. The Lottery Ticket Hypothesis for Pre-trained BERT Networks , 2020, NeurIPS.
[5] Anna Rumshisky,et al. When BERT Plays the Lottery, All Tickets Are Winning , 2020, EMNLP.
[6] Jimmy J. Lin,et al. DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference , 2020, ACL.
[7] Yiming Yang,et al. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.
[8] Michael Carbin,et al. Comparing Rewinding and Fine-tuning in Neural Network Pruning , 2020, ICLR.
[9] Dan Klein,et al. Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers , 2020, ArXiv.
[10] Mitchell A. Gordon,et al. Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning , 2020, REPL4NLP.
[11] P. S. Castro,et al. Rigging the Lottery: Making All Tickets Winners , 2019, ICML.
[12] Peter J. Liu,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[13] J. Scott McCarley,et al. Pruning a BERT-based Question Answering Model , 2019, ArXiv.
[14] Moshe Wasserblat,et al. Q8BERT: Quantized 8Bit BERT , 2019, 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS).
[15] Ziheng Wang,et al. Structured Pruning of Large Language Models , 2019, EMNLP.
[16] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[17] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[18] Yue Wang,et al. Drawing early-bird tickets: Towards more efficient training of deep networks , 2019, ICLR.
[19] Edouard Grave,et al. Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.
[20] Xin Jiang,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2019, FINDINGS.
[21] M. Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[22] Michael W. Mahoney,et al. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT , 2019, AAAI.
[23] Yu Cheng,et al. Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.
[24] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[25] Yuandong Tian,et al. Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP , 2019, ICLR.
[26] Yuandong Tian,et al. One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers , 2019, NeurIPS.
[27] Fedor Moiseev,et al. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.
[28] Jason Yosinski,et al. Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask , 2019, NeurIPS.
[29] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[30] James Demmel,et al. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes , 2019, ICLR.
[31] Samuel R. Bowman,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[32] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.
[33] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[34] Zhiqiang Shen,et al. Learning Efficient Convolutional Networks through Network Slimming , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[35] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[36] Yiran Chen,et al. Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.
[37] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[38] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[40] Zhe Gan,et al. Ultra-Data-Efficient GAN Training: Drawing A Lottery Ticket First, Then Training It Toughly , 2021, ArXiv.
[41] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.