Distributed Distillation for On-Device Learning
暂无分享,去创建一个
[1] John N. Tsitsiklis,et al. Problems in decentralized decision making and computation , 1984 .
[2] Rayid Ghani,et al. Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.
[3] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.
[4] Forrest N. Iandola,et al. How to scale distributed deep learning? , 2016, ArXiv.
[5] Stefan Wrobel,et al. Efficient Decentralized Deep Learning by Dynamic Model Averaging , 2018, ECML/PKDD.
[6] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.
[7] Oleksandr Makeyev,et al. Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).
[8] P. Lax,et al. Multivariable Calculus with Applications , 2018 .
[9] Ramesh Raskar,et al. Distributed learning of deep neural network over multiple agents , 2018, J. Netw. Comput. Appl..
[10] Martin Jaggi,et al. Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication , 2019, ICML.
[11] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[12] Peter Richtárik,et al. Federated Optimization: Distributed Machine Learning for On-Device Intelligence , 2016, ArXiv.
[13] H. Robbins,et al. A Convergence Theorem for Non Negative Almost Supermartingales and Some Applications , 1985 .
[14] Aryan Mokhtari,et al. Robust and Communication-Efficient Collaborative Learning , 2019, NeurIPS.
[15] Richard Nock,et al. Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..
[16] Chinmay Hegde,et al. Collaborative Deep Learning in Fixed Topology Networks , 2017, NIPS.
[17] Vladimir Braverman,et al. Communication-efficient distributed SGD with Sketching , 2019, NeurIPS.
[18] Xu Lan,et al. Knowledge Distillation by On-the-Fly Native Ensemble , 2018, NeurIPS.
[19] Thomas Hofmann,et al. Communication-Efficient Distributed Dual Coordinate Ascent , 2014, NIPS.
[20] Raj Kumar Maity,et al. vqSGD: Vector Quantized Stochastic Gradient Descent , 2019, IEEE Transactions on Information Theory.
[21] Xiangru Lian,et al. D2: Decentralized Training over Decentralized Data , 2018, ICML.
[22] Asuman E. Ozdaglar,et al. Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.
[23] Geoffrey E. Hinton,et al. Large scale distributed neural network training through online distillation , 2018, ICLR.
[24] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[25] Wei Zhang,et al. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.
[26] Pascal Bianchi,et al. Convergence of a Multi-Agent Projected Stochastic Gradient Algorithm for Non-Convex Optimization , 2011, IEEE Transactions on Automatic Control.
[27] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[28] Ludwig Schmidt,et al. Unlabeled Data Improves Adversarial Robustness , 2019, NeurIPS.
[29] W. Rudin. Principles of mathematical analysis , 1964 .
[30] Huchuan Lu,et al. Deep Mutual Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[31] Mikhail Belkin,et al. A Co-Regularization Approach to Semi-supervised Learning with Multiple Views , 2005 .
[32] Yann LeCun,et al. Deep learning with Elastic Averaging SGD , 2014, NIPS.
[33] Jakub Konecný,et al. Federated Optimization: Distributed Optimization Beyond the Datacenter , 2015, ArXiv.
[34] Peter Richtárik,et al. Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.
[35] Christoforos N. Hadjicostis,et al. Distributed strategies for average consensus in directed graphs , 2011, IEEE Conference on Decision and Control and European Control Conference.
[36] Avrim Blum,et al. The Bottleneck , 2021, Monopsony Capitalism.
[37] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Alfredo N. Iusem,et al. On the projected subgradient method for nonsmooth convex optimization in a Hilbert space , 1998, Math. Program..
[39] Thomas Gärtner,et al. Efficient co-regularised least squares regression , 2006, ICML.
[40] Amir Salman Avestimehr,et al. Group Knowledge Transfer: Collaborative Training of Large CNNs on the Edge , 2020, ArXiv.
[41] Rich Caruana,et al. Model compression , 2006, KDD '06.
[42] J. Norris. Appendix: probability and measure , 1997 .
[43] Jianyu Wang,et al. Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms , 2018, ArXiv.
[44] Martin J. Wainwright,et al. Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.
[45] Behrouz Touri,et al. Non-Convex Distributed Optimization , 2015, IEEE Transactions on Automatic Control.