Exploring the Limits of Simple Learners in Knowledge Distillation for Document Classification with DocBERT
暂无分享,去创建一个
[1] Jimmy J. Lin,et al. Rethinking Complex Neural Network Architectures for Document Classification , 2019, NAACL.
[2] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[3] Yiming Yang,et al. Deep Learning for Extreme Multi-label Text Classification , 2017, SIGIR.
[4] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[5] Sholom M. Weiss,et al. Automated learning of decision rules for text categorization , 1994, TOIS.
[6] Rich Caruana,et al. Do Deep Nets Really Need to be Deep? , 2013, NIPS.
[7] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[8] Wei Wu,et al. SGM: Sequence Generation Model for Multi-label Classification , 2018, COLING.
[9] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[10] Diyi Yang,et al. Hierarchical Attention Networks for Document Classification , 2016, NAACL.
[11] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.
[12] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[13] Tong Zhang,et al. Deep Pyramid Convolutional Neural Networks for Text Categorization , 2017, ACL.
[14] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.
[15] Andrew McCallum,et al. Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.
[16] Jimmy J. Lin,et al. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks , 2019, ArXiv.
[17] Yann LeCun,et al. Very Deep Convolutional Networks for Text Classification , 2016, EACL.