暂无分享,去创建一个
Zhe Gan | Jingjing Liu | Shuohang Wang | Zhangyang Wang | Yu Cheng | Xiaohan Chen | Zhangyang Wang | Zhe Gan | Shuohang Wang | Jingjing Liu | Xiaohan Chen | Yu Cheng
[1] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[2] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[3] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Yiran Chen,et al. Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.
[5] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[6] Zhiqiang Shen,et al. Learning Efficient Convolutional Networks through Network Slimming , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[7] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[8] Fedor Moiseev,et al. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.
[9] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[10] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.
[11] Yu Cheng,et al. Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.
[12] M. Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[13] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[14] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[15] Yuandong Tian,et al. One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers , 2019, NeurIPS.
[16] Moshe Wasserblat,et al. Q8BERT: Quantized 8Bit BERT , 2019, 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS).
[17] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[18] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[19] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[20] Jason Yosinski,et al. Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask , 2019, NeurIPS.
[21] J. Scott McCarley,et al. Pruning a BERT-based Question Answering Model , 2019, ArXiv.
[22] Yu Cheng,et al. Contrastive Distillation on Intermediate Representations for Language Model Compression , 2020, EMNLP.
[23] Xin Jiang,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2019, FINDINGS.
[24] Dan Klein,et al. Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers , 2020, ArXiv.
[25] Yiming Yang,et al. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.
[26] Michael Carbin,et al. Comparing Rewinding and Fine-tuning in Neural Network Pruning , 2019, ICLR.
[27] P. S. Castro,et al. Rigging the Lottery: Making All Tickets Winners , 2019, ICML.
[28] Yue Wang,et al. Drawing early-bird tickets: Towards more efficient training of deep networks , 2019, ICLR.
[29] Yuandong Tian,et al. Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP , 2019, ICLR.
[30] Yang Zhang,et al. The Lottery Ticket Hypothesis for Pre-trained BERT Networks , 2020, NeurIPS.
[31] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[32] Michael W. Mahoney,et al. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT , 2019, AAAI.
[33] Ziheng Wang,et al. Structured Pruning of Large Language Models , 2019, EMNLP.
[34] Jimmy J. Lin,et al. DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference , 2020, ACL.
[35] Anna Rumshisky,et al. When BERT Plays the Lottery, All Tickets Are Winning , 2020, EMNLP.
[36] James Demmel,et al. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes , 2019, ICLR.
[37] Mitchell A. Gordon,et al. Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning , 2020, REPL4NLP.
[38] Edouard Grave,et al. Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.
[39] Zhe Gan,et al. Playing Lottery Tickets with Vision and Language , 2021, AAAI.
[40] Shiyu Chang,et al. The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[41] David P. Kreil,et al. Hopfield Networks is All You Need , 2020, ICLR.
[42] Zhe Gan,et al. Ultra-Data-Efficient GAN Training: Drawing A Lottery Ticket First, Then Training It Toughly , 2021, ArXiv.