PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination
暂无分享,去创建一个
Anamitra R. Choudhury | Yogish Sabharwal | Venkatesan T. Chakaravarthy | Saurabh Goyal | Ashish Verma | Yogish Sabharwal | Ashish Verma | Saurabh Goyal | Saurabh M. Raje | S. Raje
[1] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.
[2] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[3] Dmitry P. Vetrov,et al. Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.
[4] Xuancheng Ren,et al. Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection , 2019, ArXiv.
[5] André F. T. Martins,et al. Sparse Sequence-to-Sequence Models , 2019, ACL.
[6] Xiangyu Zhang,et al. Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[7] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[8] Xiaodong Liu,et al. Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding , 2019, ArXiv.
[9] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[10] Michael W. Mahoney,et al. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT , 2019, AAAI.
[11] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[12] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[13] J. Scott McCarley,et al. Pruning a BERT-based Question Answering Model , 2019, ArXiv.
[14] Guokun Lai,et al. RACE: Large-scale ReAding Comprehension Dataset From Examinations , 2017, EMNLP.
[15] Jaewoo Kang,et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..
[16] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.
[17] Ziheng Wang,et al. Structured Pruning of Large Language Models , 2020, EMNLP.
[18] André F. T. Martins,et al. Adaptively Sparse Transformers , 2019, EMNLP.
[19] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[20] Jaime G. Carbonell,et al. Mining Insights from Large-Scale Corpora Using Fine-Tuned Language Models , 2020, ECAI.
[21] Yu Cheng,et al. Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.
[22] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[23] Ming Yang,et al. Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.
[24] Misha Denil,et al. Predicting Parameters in Deep Learning , 2014 .
[25] Eunhyeok Park,et al. Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications , 2015, ICLR.
[26] Timo Aila,et al. Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.
[27] Vineeth N. Balasubramanian,et al. Deep Model Compression: Distilling Knowledge from Noisy Teachers , 2016, ArXiv.
[28] Ramón Fernández Astudillo,et al. From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification , 2016, ICML.
[29] Mirella Lapata,et al. Text Summarization with Pretrained Encoders , 2019, EMNLP.