暂无分享,去创建一个
Thomas Wolf | Lysandre Debut | Victor Sanh | Julien Chaumond | Thomas Wolf | Victor Sanh | Julien Chaumond | Lysandre Debut
[1] Rich Caruana,et al. Model compression , 2006, KDD '06.
[2] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.
[3] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[4] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[5] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[6] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[7] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[8] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[9] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[10] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[11] Naveen Arivazhagan,et al. Small and Practical BERT Models for Sequence Labeling , 2019, EMNLP.
[12] Daxin Jiang,et al. Model Compression with Multi-Task Knowledge Distillation for Web-scale Question Answering System , 2019, ArXiv.
[13] Debajyoti Chatterjee. Making Neural Machine Reading Comprehension Faster , 2019, ArXiv.
[14] Ming-Wei Chang,et al. Well-Read Students Learn Better: The Impact of Student Initialization on Knowledge Distillation , 2019, ArXiv.
[15] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[16] Jimmy J. Lin,et al. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks , 2019, ArXiv.
[17] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[18] Andrew McCallum,et al. Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.
[19] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[20] Samuel R. Bowman,et al. jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models , 2020, ACL.
[21] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.