暂无分享,去创建一个
Alexander M. Rush | Ella Charlaix | Victor Sanh | Franccois Lagunas | François Lagunas | Victor Sanh | Ella Charlaix | Victor Sanh
[1] Alexander M. Rush,et al. Movement Pruning: Adaptive Sparsity by Fine-Tuning , 2020, NeurIPS.
[2] Jimmy J. Lin,et al. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks , 2019, ArXiv.
[3] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[4] Alexander M. Rush,et al. Pre-trained Summarization Distillation , 2020, ArXiv.
[5] Quanlu Zhang,et al. LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression , 2020, COLING.
[6] Yiming Yang,et al. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.
[7] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.
[8] Preslav Nakov,et al. Poor Man's BERT: Smaller and Faster Transformer Models , 2020, ArXiv.
[9] Percy Liang,et al. Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.
[10] Ming-Wei Chang,et al. Well-Read Students Learn Better: The Impact of Student Initialization on Knowledge Distillation , 2019, ArXiv.
[11] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[12] Ali Farhadi,et al. What’s Hidden in a Randomly Weighted Neural Network? , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.
[14] H. Kay. Teaching Machines , 1961, Nature.
[15] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[16] Avirup Sil,et al. Structured Pruning of a BERT-based Question Answering Model , 2019 .
[17] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[18] Edouard Grave,et al. Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.
[19] Kevin Duh,et al. Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning , 2020, RepL4NLP@ACL.
[20] Fedor Moiseev,et al. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.
[21] Michael Carbin,et al. The Lottery Ticket Hypothesis: Training Pruned Neural Networks , 2018, ArXiv.
[22] Hongbo Zhang,et al. Quora Question Pairs , 2017 .
[23] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[24] Dan Klein,et al. Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers , 2020, ArXiv.
[25] Hany Hassan Awadalla,et al. FastFormers: Highly Efficient Transformer Models for Natural Language Understanding , 2020, SUSTAINLP.
[26] Yu Cheng,et al. Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.
[27] Alexander M. Rush,et al. Parameter-Efficient Transfer Learning with Diff Pruning , 2021, ACL/IJCNLP.
[28] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.
[29] Ziheng Wang,et al. Structured Pruning of Large Language Models , 2020, EMNLP.
[30] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.
[31] Phil Blunsom,et al. Teaching Machines to Read and Comprehend , 2015, NIPS.
[32] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[33] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.
[34] Max Welling,et al. Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.
[35] Matthijs Douze,et al. FastText.zip: Compressing text classification models , 2016, ArXiv.
[36] Christopher D. Manning,et al. Compression of Neural Machine Translation Models via Pruning , 2016, CoNLL.
[37] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.