Distributed Learning over Unreliable Networks
暂无分享,去创建一个
Dan Alistarh | Ji Liu | Simon Kassing | Ankit Singla | Hanlin Tang | Ce Zhang | Chen Yu | Cedric Renggli | Dan Alistarh | Ji Liu | Ce Zhang | Cédric Renggli | Hanlin Tang | Chen Yu | Ankit Singla | S. Kassing
[1] Kannan Ramchandran,et al. Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates , 2018, ICML.
[2] Dan Alistarh,et al. Byzantine Stochastic Gradient Descent , 2018, NeurIPS.
[3] Yijun Huang,et al. Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization , 2015, NIPS.
[4] Xiangru Lian,et al. D2: Decentralized Training over Decentralized Data , 2018, ICML.
[5] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[6] Wei Shi,et al. Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs , 2016, SIAM J. Optim..
[7] Zheng Xu,et al. Adaptive Consensus ADMM for Distributed Optimization , 2017, ICML.
[8] Dan Alistarh,et al. SparCML: high-performance sparse communication for machine learning , 2018, SC.
[9] Babak Falsafi,et al. Training DNNs with Hybrid Block Floating Point , 2018, NeurIPS.
[10] Alexander J. Smola,et al. AdaDelay: Delay Adaptive Distributed Stochastic Convex Optimization , 2015, ArXiv.
[11] Abhinav Vishnu,et al. GossipGraD: Scalable Deep Learning using Gossip Communication based Asynchronous Gradient Descent , 2018, ArXiv.
[12] Hanlin Tang,et al. Communication Compression for Decentralized Training , 2018, NeurIPS.
[13] Forrest N. Iandola,et al. FireCaffe: Near-Linear Acceleration of Deep Neural Network Training on Compute Clusters , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[15] Stephen P. Boyd,et al. Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.
[16] Jiawei Jiang,et al. Heterogeneity-aware Distributed Parameter Servers , 2017, SIGMOD Conference.
[17] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[18] Fabian Pedregosa,et al. ASAGA: Asynchronous Parallel SAGA , 2016, AISTATS.
[19] Junzhou Huang,et al. Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization , 2018, ICML.
[20] Angelia Nedic,et al. Distributed optimization over time-varying directed graphs , 2013, 52nd IEEE Conference on Decision and Control.
[21] Rong Zheng,et al. Asynchronous stochastic gradient descent for DNN training , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[22] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[23] Li Fei-Fei,et al. Distributed Asynchronous Optimization with Unbounded Delays: How Slow Can You Go? , 2018, ICML.
[24] John C. Duchi,et al. Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[25] Zhaoran Wang,et al. NESTT: A Nonconvex Primal-Dual Splitting Method for Distributed and Stochastic Optimization , 2016, NIPS.
[26] Tao Lin,et al. Don't Use Large Mini-Batches, Use Local SGD , 2018, ICLR.
[27] Tom Goldstein,et al. Unwrapping ADMM: Efficient Distributed Computing via Transpose Reduction , 2015, AISTATS.
[28] Rachid Guerraoui,et al. Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent , 2017, NIPS.
[29] Wei Zhang,et al. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.
[30] Asuman E. Ozdaglar,et al. Distributed Subgradient Methods for Convex Optimization Over Random Networks , 2011, IEEE Transactions on Automatic Control.
[31] Yann LeCun,et al. Deep learning with Elastic Averaging SGD , 2014, NIPS.
[32] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.
[33] Forrest N. Iandola,et al. How to scale distributed deep learning? , 2016, ArXiv.
[34] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[35] Laurent Massoulié,et al. Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks , 2017, ICML.
[36] Xiaojing Ye,et al. Consensus optimization with delayed and stochastic gradients on decentralized networks , 2016, 2016 IEEE International Conference on Big Data (Big Data).
[37] Matthieu Cord,et al. Gossip training for deep learning , 2016, ArXiv.
[38] Martin Jaggi,et al. Sparsified SGD with Memory , 2018, NeurIPS.
[39] Stéphan Clémençon,et al. Gossip Dual Averaging for Decentralized Optimization of Pairwise Functions , 2016, ICML.
[40] Amit Agarwal,et al. CNTK: Microsoft's Open-Source Deep-Learning Toolkit , 2016, KDD.
[41] Martin Jaggi,et al. COLA: Decentralized Linear Learning , 2018, NeurIPS.
[42] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[43] Sebastian U. Stich,et al. Local SGD Converges Fast and Communicates Little , 2018, ICLR.
[44] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[45] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Tao Li,et al. Consensus Conditions of Multi-Agent Systems With Time-Varying Topologies and Stochastic Communication Noises , 2010, IEEE Transactions on Automatic Control.
[47] Wei Zhang,et al. Asynchronous Decentralized Parallel Stochastic Gradient Descent , 2017, ICML.
[48] Mladen Kolar,et al. Efficient Distributed Learning with Sparsity , 2016, ICML.
[49] Yi Zhou,et al. Communication-efficient algorithms for decentralized and stochastic optimization , 2017, Mathematical Programming.
[50] Aryan Mokhtari,et al. Towards More Efficient Stochastic Decentralized Learning: Faster Convergence and Sparse Communication , 2018, ICML.