Online Knowledge Distillation with Diverse Peers
暂无分享,去创建一个
Chun Chen | Yan Feng | Defang Chen | Can Wang | Jian-Ping Mei | Chun Chen | Can Wang | Yan Feng | Jian-Ping Mei | Defang Chen
[1] Ming Dong,et al. Coupled End-to-End Transfer Learning with Generalized Fisher Information , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[2] Timo Aila,et al. Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.
[3] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[4] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.
[5] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[6] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .
[7] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[9] Huchuan Lu,et al. Deep Mutual Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[11] Yoshua Bengio,et al. FitNets: Hints for Thin Deep Nets , 2014, ICLR.
[12] Neil D. Lawrence,et al. Variational Information Distillation for Knowledge Transfer , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[14] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[15] Geoffrey E. Hinton,et al. Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.
[16] Geoffrey E. Hinton,et al. Large scale distributed neural network training through online distillation , 2018, ICLR.
[17] Han Zhang,et al. Self-Attention Generative Adversarial Networks , 2018, ICML.
[18] Rich Caruana,et al. Do Deep Nets Really Need to be Deep? , 2013, NIPS.
[19] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[20] Guocong Song,et al. Collaborative Learning for Deep Neural Networks , 2018, NeurIPS.
[21] Junmo Kim,et al. A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[23] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.
[24] Ludmila I. Kuncheva,et al. Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.
[25] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[26] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[27] Rich Caruana,et al. Model compression , 2006, KDD '06.
[28] Misha Denil,et al. Predicting Parameters in Deep Learning , 2014 .
[29] Xu Lan,et al. Knowledge Distillation by On-the-Fly Native Ensemble , 2018, NeurIPS.
[30] Pietro Liò,et al. Graph Attention Networks , 2017, ICLR.
[31] Dan Alistarh,et al. Model compression via distillation and quantization , 2018, ICLR.
[32] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).