暂无分享,去创建一个
Anamitra R. Choudhury | Yogish Sabharwal | Venkatesan T. Chakaravarthy | Ashish Verma | Saurabh Goyal | Saurabh ManishRaje
[1] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.
[2] Guokun Lai,et al. RACE: Large-scale ReAding Comprehension Dataset From Examinations , 2017, EMNLP.
[3] J. Scott McCarley,et al. Pruning a BERT-based Question Answering Model , 2019, ArXiv.
[4] Jaewoo Kang,et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..
[5] Xiangyu Zhang,et al. Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[6] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[7] Eunhyeok Park,et al. Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications , 2015, ICLR.
[8] André F. T. Martins,et al. Sparse Sequence-to-Sequence Models , 2019, ACL.
[9] André F. T. Martins,et al. Adaptively Sparse Transformers , 2019, EMNLP.
[10] Mirella Lapata,et al. Text Summarization with Pretrained Encoders , 2019, EMNLP.
[11] Yu Cheng,et al. Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.
[12] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[13] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[14] Xiaodong Liu,et al. Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding , 2019, ArXiv.
[15] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[16] Vineeth N. Balasubramanian,et al. Deep Model Compression: Distilling Knowledge from Noisy Teachers , 2016, ArXiv.
[17] Ming Yang,et al. Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.
[18] Moshe Wasserblat,et al. Q8BERT: Quantized 8Bit BERT , 2019, 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS).
[19] Dmitry P. Vetrov,et al. Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.
[20] Ramón Fernández Astudillo,et al. From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification , 2016, ICML.
[21] Kurt Keutzer,et al. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT , 2020, AAAI.
[22] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.
[23] Xuancheng Ren,et al. Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection , 2019, ArXiv.
[24] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[25] Timo Aila,et al. Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.
[26] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[27] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[28] Misha Denil,et al. Predicting Parameters in Deep Learning , 2014 .
[29] Ziheng Wang,et al. Structured Pruning of Large Language Models , 2019, EMNLP.