DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
暂无分享,去创建一个
[1] Samuel R. Bowman,et al. jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models , 2020, ACL.
[2] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[3] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[4] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[5] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[6] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[7] Naveen Arivazhagan,et al. Small and Practical BERT Models for Sequence Labeling , 2019, EMNLP.
[8] Rich Caruana,et al. Model compression , 2006, KDD '06.
[9] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[10] Daxin Jiang,et al. Model Compression with Multi-Task Knowledge Distillation for Web-scale Question Answering System , 2019, ArXiv.
[11] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.
[12] Debajyoti Chatterjee. Making Neural Machine Reading Comprehension Faster , 2019, ArXiv.
[13] Ming-Wei Chang,et al. Well-Read Students Learn Better: The Impact of Student Initialization on Knowledge Distillation , 2019, ArXiv.
[14] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[15] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[16] Jimmy J. Lin,et al. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks , 2019, ArXiv.
[17] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[18] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[19] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[20] Andrew McCallum,et al. Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.
[21] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.