暂无分享,去创建一个
[1] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[2] Tom Fawcett,et al. An introduction to ROC analysis , 2006, Pattern Recognit. Lett..
[3] Chao Zhang,et al. Low-resolution Visual Recognition via Deep Feature Distillation , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.
[5] Ed H. Chi,et al. Understanding and Improving Knowledge Distillation , 2020, ArXiv.
[6] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[7] Yiming Yang,et al. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.
[8] Jimmy J. Lin,et al. Natural Language Generation for Effective Knowledge Distillation , 2019, EMNLP.
[9] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[10] Yu Cheng,et al. Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.
[11] Cristian Onose,et al. Sentence-Level Propaganda Detection in News Articles with Transfer Learning and BERT-BiLSTM-Capsule Model , 2019, EMNLP.
[12] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[13] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[14] Jimmy J. Lin,et al. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks , 2019, ArXiv.
[15] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[16] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[17] Ankur Bapna,et al. The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.
[18] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[19] Tie-Yan Liu,et al. Learning to rank for information retrieval , 2009, SIGIR.
[20] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[21] Christopher D. Manning,et al. Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..
[22] Koichi Shinoda,et al. Sequence-level Knowledge Distillation for Model Compression of Attention-based Sequence-to-sequence Speech Recognition , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Hermann Ney,et al. A Comparison of Transformer and LSTM Encoder Decoder Models for ASR , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[24] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.