Bo Wu | Yuxiong He | Minjia Zhang | Connor Holmes | Yuxiong He | Minjia Zhang | Connor Holmes | Bo Wu
[1] Yiming Yang,et al. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.
[2] Olatunji Ruwase,et al. ZeRO: Memory optimizations Toward Training Trillion Parameter Models , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[3] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.
[4] Michael R. Lyu,et al. BinaryBERT: Pushing the Limit of BERT Quantization , 2020, ACL.
[5] Yiran Chen,et al. Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.
[6] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[7] Yu Cheng,et al. Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.
[8] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[9] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[10] Jiayu Li,et al. Progressive Weight Pruning of Deep Neural Networks using ADMM , 2018, ArXiv.
[11] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[12] Guy Jacob,et al. Neural Network Distiller: A Python Package For DNN Compression Research , 2019, ArXiv.
[13] Kevin Duh,et al. Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning , 2020, RepL4NLP@ACL.
[14] Jiayu Li,et al. ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Methods of Multipliers , 2018, ASPLOS.
[15] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[16] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[17] Asit K. Mishra,et al. Accelerating Sparse Deep Neural Networks , 2021, ArXiv.
[18] Emily Denton,et al. Characterising Bias in Compressed Models , 2020, ArXiv.
[19] Yanzhi Wang,et al. Reweighted Proximal Pruning for Large-Scale Language Representation , 2019, ArXiv.
[20] Song Han,et al. Deep compression and EIE: Efficient inference engine on compressed deep neural network , 2016, 2016 IEEE Hot Chips 28 Symposium (HCS).
[21] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[22] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.
[23] Michael W. Mahoney,et al. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT , 2019, AAAI.
[24] Zhijie Zhang,et al. Learning N: M Fine-grained Structured Sparse Neural Networks From Scratch , 2021, ICLR.
[25] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[26] Alan W Black,et al. Measuring Bias in Contextualized Word Representations , 2019, Proceedings of the First Workshop on Gender Bias in Natural Language Processing.
[27] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[28] Shiyu Chang,et al. The Lottery Ticket Hypothesis for Pre-trained BERT Networks , 2020, NeurIPS.
[29] Wei Niu,et al. PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-time Execution on Mobile Devices , 2020, AAAI.
[30] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.
[31] Emily Denton,et al. Social Biases in NLP Models as Barriers for Persons with Disabilities , 2020, ACL.
[32] Quanlu Zhang,et al. LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression , 2020, COLING.
[33] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.