暂无分享,去创建一个
[1] Nikos Komodakis,et al. Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.
[2] Andreas Vlachos,et al. FEVER: a Large-scale Dataset for Fact Extraction and VERification , 2018, NAACL.
[3] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[4] Richard Socher,et al. Dynamic Coattention Networks For Question Answering , 2016, ICLR.
[5] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[6] Zhi Jin,et al. Distilling Word Embeddings: An Encoding Approach , 2015, CIKM.
[7] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[8] Bowen Zhou,et al. LSTM-based Deep Learning Models for non-factoid answer selection , 2015, ArXiv.
[9] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[10] Rich Caruana,et al. Model compression , 2006, KDD '06.
[11] Alexander M. Rush,et al. Sequence-Level Knowledge Distillation , 2016, EMNLP.
[12] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[13] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[14] Jason Weston,et al. Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.
[15] Yoshua Bengio,et al. FitNets: Hints for Thin Deep Nets , 2014, ICLR.
[16] Nan Yang,et al. Attention-Guided Answer Distillation for Machine Reading Comprehension , 2018, EMNLP.