Structured Pruning of a BERT-based Question Answering Model
暂无分享,去创建一个
[1] Dan Klein,et al. Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers , 2020, ArXiv.
[2] Peter J. Liu,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[3] Moshe Wasserblat,et al. Q8BERT: Quantized 8Bit BERT , 2019, 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS).
[4] Ziheng Wang,et al. Structured Pruning of Large Language Models , 2019, EMNLP.
[5] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[6] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[7] Ming-Wei Chang,et al. Well-Read Students Learn Better: On the Importance of Pre-training Compact Models , 2019 .
[8] Xin Jiang,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2019, FINDINGS.
[9] Michael W. Mahoney,et al. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT , 2019, AAAI.
[10] G P Shrivatsa Bhargav,et al. Span Selection Pre-training for Question Answering , 2019, ACL.
[11] Naveen Arivazhagan,et al. Small and Practical BERT Models for Sequence Labeling , 2019, EMNLP.
[12] Anna Rumshisky,et al. Revealing the Dark Secrets of BERT , 2019, EMNLP.
[13] Ming-Wei Chang,et al. Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.
[14] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[15] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[16] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[17] Daxin Jiang,et al. Model Compression with Multi-Task Knowledge Distillation for Web-scale Question Answering System , 2019, ArXiv.
[18] Jimmy J. Lin,et al. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks , 2019, ArXiv.
[19] Erich Elsen,et al. The State of Sparsity in Deep Neural Networks , 2019, ArXiv.
[20] Percy Liang,et al. Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.
[21] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[22] Diederik P. Kingma,et al. Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.
[23] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[24] Yee Whye Teh,et al. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.
[25] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[26] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.
[27] Diederik P. Kingma,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[28] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.