Distilling Task-Specific Knowledge from BERT via Adversarial Belief Matching
暂无分享,去创建一个
[1] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[2] Xin Jiang,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2019, FINDINGS.
[3] Razvan Pascanu,et al. Sobolev Training for Neural Networks , 2017, NIPS.
[4] Yu Cheng,et al. Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.
[5] Jimmy J. Lin,et al. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks , 2019, ArXiv.
[6] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[7] Ankur P. Parikh,et al. Thieves on Sesame Street! Model Extraction of BERT-based APIs , 2019, ICLR.
[8] Amos Storkey,et al. Zero-shot Knowledge Transfer via Adversarial Belief Matching , 2019, NeurIPS.
[9] Yoshua Bengio,et al. FitNets: Hints for Thin Deep Nets , 2014, ICLR.
[10] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.