PopSGD: Decentralized Stochastic Gradient Descent in the Population Model
暂无分享,去创建一个
Dan Alistarh | Giorgi Nadiradze | Ilia Markov | Aditya Sharma | Vitaly Aksenov | Amirmojtaba Sabour | Dan Alistarh | Amirmojtaba Sabour | Aditya Sharma | V. Aksenov | Giorgi Nadiradze | I. Markov
[1] John C. Duchi,et al. Asynchronous stochastic convex optimization , 2015, 1508.00882.
[2] Dan Alistarh,et al. Taming unbalanced training workloads in deep learning with partial collective operations , 2019, PPoPP.
[3] Dan Alistarh,et al. Synchronous Multi-GPU Deep Learning with Low-Precision Communication: An Experimental Study , 2018 .
[4] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[6] Qiang Huo,et al. Scalable training of deep learning machines by incremental block training with intra-block parallel optimization and blockwise model-update filtering , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Asuman E. Ozdaglar,et al. Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.
[8] Wei Zhang,et al. Asynchronous Decentralized Parallel Stochastic Gradient Descent , 2017, ICML.
[9] Ohad Shamir,et al. Distributed stochastic optimization and learning , 2014, 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[10] S. Muthukrishnan,et al. Dynamic Load Balancing by Random Matchings , 1996, J. Comput. Syst. Sci..
[11] Wei Shi,et al. Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs , 2016, SIAM J. Optim..
[12] Hadrien Hendrikx,et al. Accelerated Decentralized Optimization with Local Updates for Smooth and Strongly Convex Objectives , 2018, AISTATS.
[13] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[14] Wei Zhang,et al. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.
[15] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[16] Stephen P. Boyd,et al. Fast linear iterations for distributed averaging , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).
[17] Stephen P. Boyd,et al. Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.
[18] Martin Jaggi,et al. Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication , 2019, ICML.
[19] Zengjian Hu,et al. A new analytical method for parallel, diffusion-type load balancing , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[20] Michael G. Rabbat,et al. Stochastic Gradient Push for Distributed Deep Learning , 2018, ICML.
[21] Sebastian U. Stich,et al. Local SGD Converges Fast and Communicates Little , 2018, ICLR.
[22] Johannes Gehrke,et al. Gossip-based computation of aggregate information , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..
[23] John N. Tsitsiklis,et al. Problems in decentralized decision making and computation , 1984 .
[24] Kenneth Heafield,et al. Sparse Communication for Distributed Gradient Descent , 2017, EMNLP.
[25] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[26] Nikko Strom,et al. Scalable distributed DNN training using commodity GPU cloud computing , 2015, INTERSPEECH.
[27] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[28] Sam Ade Jacobs,et al. Communication Quantization for Data-Parallel Training of Deep Neural Networks , 2016, 2016 2nd Workshop on Machine Learning in HPC Environments (MLHPC).
[29] Tao Lin,et al. Don't Use Large Mini-Batches, Use Local SGD , 2018, ICLR.
[30] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[31] Michael J. Fischer,et al. Computation in networks of passively mobile finite-state sensors , 2004, PODC '04.
[32] Mikael Johansson,et al. A Randomized Incremental Subgradient Method for Distributed Optimization in Networked Systems , 2009, SIAM J. Optim..