Stochastic Gradient Push for Distributed Deep Learning
暂无分享,去创建一个
Michael G. Rabbat | Mahmoud Assran | Nicolas Ballas | Nicolas Loizou | Michael G. Rabbat | Nicolas Ballas | Nicolas Loizou | Mahmoud Assran
[1] J. Wolfowitz. Products of indecomposable, aperiodic, stochastic matrices , 1963 .
[2] Valerie Isham,et al. Non‐Negative Matrices and Markov Chains , 1983 .
[3] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .
[4] Johannes Gehrke,et al. Gossip-based computation of aggregate information , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..
[5] Rolf Rabenseifner,et al. Optimization of Collective Reduction Operations , 2004, International Conference on Computational Science.
[6] E. Seneta. Non-negative Matrices and Markov Chains , 2008 .
[7] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[8] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[9] José M. F. Moura,et al. Fast Distributed Gradient Methods , 2011, IEEE Transactions on Automatic Control.
[10] Christoforos N. Hadjicostis,et al. Average Consensus in the Presence of Delays in Directed Graph Topologies , 2014, IEEE Transactions on Automatic Control.
[11] Christoforos N. Hadjicostis,et al. Distributed Finite-Time Average Consensus in Digraphs in the Presence of Time Delays , 2015, IEEE Transactions on Control of Network Systems.
[12] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[13] Yann LeCun,et al. Deep learning with Elastic Averaging SGD , 2014, NIPS.
[14] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[15] Angelia Nedic,et al. Stochastic Gradient-Push for Strongly Convex Functions on Time-Varying Directed Graphs , 2014, IEEE Transactions on Automatic Control.
[16] Matthieu Cord,et al. Gossip training for deep learning , 2016, ArXiv.
[17] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Forrest N. Iandola,et al. How to scale distributed deep learning? , 2016, ArXiv.
[19] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[20] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[21] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[22] Wei Zhang,et al. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.
[23] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.
[24] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[25] Chinmay Hegde,et al. Collaborative Deep Learning in Fixed Topology Networks , 2017, NIPS.
[26] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[27] Takuya Akiba,et al. Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes , 2017, ArXiv.
[28] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.
[29] Kaiming He,et al. Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.
[30] Michael G. Rabbat,et al. Network Topology and Communication-Computation Tradeoffs in Decentralized Optimization , 2017, Proceedings of the IEEE.
[31] Wei Zhang,et al. Asynchronous Decentralized Parallel Stochastic Gradient Descent , 2017, ICML.
[32] Yuanzhou Yang,et al. Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes , 2018, ArXiv.
[33] Myle Ott,et al. Scaling Neural Machine Translation , 2018, WMT.
[34] Kamyar Azizzadenesheli,et al. signSGD with Majority Vote is Communication Efficient and Fault Tolerant , 2018, ICLR.
[35] Michael G. Rabbat,et al. Asynchronous Gradient Push , 2018, IEEE Transactions on Automatic Control.