Moniqua: Modulo Quantized Communication in Decentralized SGD
暂无分享,去创建一个
[1] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[2] Xiaojing Ye,et al. Consensus optimization with delayed and stochastic gradients on decentralized networks , 2016, 2016 IEEE International Conference on Big Data (Big Data).
[3] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[4] Martin Jaggi,et al. COLA: Decentralized Linear Learning , 2018, NeurIPS.
[5] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[6] Hanlin Tang,et al. Communication Compression for Decentralized Training , 2018, NeurIPS.
[7] Thinh T. Doan,et al. On the Convergence of Distributed Subgradient Methods under Quantization , 2018, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[8] George Michailidis,et al. DAdam: A Consensus-Based Distributed Adaptive Gradient Method for Online Optimization , 2018, IEEE Transactions on Signal Processing.
[9] Robert B. Ross,et al. Using MPI-2: Advanced Features of the Message Passing Interface , 2003, CLUSTER.
[10] Junzhou Huang,et al. Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization , 2018, ICML.
[11] Kenneth G. Paterson,et al. Certificateless Public Key Cryptography , 2003 .
[12] Xiangru Lian,et al. D2: Decentralized Training over Decentralized Data , 2018, ICML.
[13] Ji Liu,et al. DoubleSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression , 2019, ICML.
[14] Martin Jaggi,et al. Sparsified SGD with Memory , 2018, NeurIPS.
[15] Hadrien Hendrikx,et al. Accelerated Decentralized Optimization with Local Updates for Smooth and Strongly Convex Objectives , 2018, AISTATS.
[16] Dan Alistarh,et al. ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning , 2017, ICML.
[17] Dan Alistarh,et al. Distributed Learning over Unreliable Networks , 2018, ICML.
[18] Peng Jiang,et al. A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication , 2018, NeurIPS.
[19] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[20] Dan Alistarh,et al. The Convergence of Sparsified Gradient Methods , 2018, NeurIPS.
[21] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[22] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[23] Jiaqi Zhang,et al. Asynchronous Decentralized Optimization in Directed Networks , 2019, ArXiv.
[24] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[25] Ananda Theertha Suresh,et al. Distributed Mean Estimation with Limited Communication , 2016, ICML.
[26] Aryan Mokhtari,et al. Decentralized double stochastic averaging gradient , 2015, 2015 49th Asilomar Conference on Signals, Systems and Computers.
[27] Ji Liu,et al. Gradient Sparsification for Communication-Efficient Distributed Optimization , 2017, NeurIPS.
[28] Sebastian U. Stich,et al. Local SGD Converges Fast and Communicates Little , 2018, ICLR.
[29] Kunle Olukotun,et al. High-Accuracy Low-Precision Training , 2018, ArXiv.
[30] Dimitris S. Papailiopoulos,et al. ATOMO: Communication-efficient Learning via Atomic Sparsification , 2018, NeurIPS.
[31] Christopher De Sa,et al. Distributed Learning with Sublinear Communication , 2019, ICML.
[32] Amit Agarwal,et al. CNTK: Microsoft's Open-Source Deep-Learning Toolkit , 2016, KDD.
[33] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[34] Yi Zhou,et al. Communication-efficient algorithms for decentralized and stochastic optimization , 2017, Mathematical Programming.
[35] Ali H. Sayed,et al. Decentralized Consensus Optimization With Asynchrony and Delays , 2016, IEEE Transactions on Signal and Information Processing over Networks.
[36] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[37] Alexander J. Smola,et al. Communication Efficient Distributed Machine Learning with the Parameter Server , 2014, NIPS.
[38] Hanan Samet,et al. Training Quantized Nets: A Deeper Understanding , 2017, NIPS.
[39] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[40] Aryan Mokhtari,et al. Quantized Decentralized Consensus Optimization , 2018, 2018 IEEE Conference on Decision and Control (CDC).
[41] V. Climenhaga. Markov chains and mixing times , 2013 .
[42] Yoshua Bengio,et al. Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..
[43] Martin Jaggi,et al. Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication , 2019, ICML.
[44] Michael G. Rabbat,et al. Stochastic Gradient Push for Distributed Deep Learning , 2018, ICML.
[45] Wei Zhang,et al. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.
[46] Kunle Olukotun,et al. Understanding and optimizing asynchronous low-precision stochastic gradient descent , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[47] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[48] Tong Zhang,et al. Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.
[49] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[50] Dan Alistarh. A Brief Tutorial on Distributed and Concurrent Machine Learning , 2018, PODC.
[51] Lei Yuan,et al. $\texttt{DeepSqueeze}$: Decentralization Meets Error-Compensated Compression , 2019 .
[52] Dan Alistarh,et al. Synchronous Multi-GPU Deep Learning with Low-Precision Communication: An Experimental Study , 2018 .
[53] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[54] Wei Zhang,et al. Asynchronous Decentralized Parallel Stochastic Gradient Descent , 2017, ICML.