Stochastic distributed learning with gradient quantization and double-variance reduction
暂无分享,去创建一个
Sebastian U. Stich | Peter Richtárik | Konstantin Mishchenko | D. Kovalev | Samuel Horváth | S. Stich
[1] Peter Richtárik,et al. Don't Jump Through Hoops and Remove Those Loops: SVRG and Katyusha are Better Without the Outer Loop , 2019, ALT.
[2] F. Bach,et al. Stochastic quasi-gradient methods: variance reduction via Jacobian sketching , 2018, Mathematical Programming.
[3] Xiang Li,et al. Communication Efficient Decentralized Training with Multiple Local Updates , 2019, ArXiv.
[4] Peter Richtárik,et al. Distributed Learning with Compressed Gradient Differences , 2019, ArXiv.
[5] Kaiwen Zhou,et al. Direct Acceleration of SAGA using Sampled Negative Momentum , 2018, AISTATS.
[6] Sebastian U. Stich,et al. Local SGD Converges Fast and Communicates Little , 2018, ICLR.
[7] Dan Alistarh,et al. The Convergence of Sparsified Gradient Methods , 2018, NeurIPS.
[8] Martin Jaggi,et al. Sparsified SGD with Memory , 2018, NeurIPS.
[9] Jean-Baptiste Cordonnier,et al. Convex Optimization using Sparsified Stochastic Gradient Descent with Memory , 2018 .
[10] Junzhou Huang,et al. Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization , 2018, ICML.
[11] Sarit Khirirat,et al. Distributed learning with compressed gradients , 2018, 1806.06573.
[12] Sebastian U. Stich,et al. SVRG meets SAGA: k-SVRG - A Tale of Limited Memory , 2018, ArXiv.
[13] Wotao Yin,et al. Breaking the Span Assumption Yields Fast Finite-Sum Minimization , 2018, NeurIPS.
[14] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.
[15] Ji Liu,et al. Gradient Sparsification for Communication-Efficient Distributed Optimization , 2017, NeurIPS.
[16] James Demmel,et al. ImageNet Training in Minutes , 2017, ICPP.
[17] Peter Richtárik,et al. Randomized Distributed Mean Estimation: Accuracy vs. Communication , 2016, Front. Appl. Math. Stat..
[18] Martin Jaggi,et al. Fully Quantized Distributed Gradient Descent , 2018 .
[19] Dan Alistarh,et al. ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning , 2017, ICML.
[20] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[21] Saibal Mukhopadhyay,et al. On-chip training of recurrent neural networks with limited numerical precision , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).
[22] Kenneth Heafield,et al. Sparse Communication for Distributed Gradient Descent , 2017, EMNLP.
[23] Jie Liu,et al. SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.
[24] Ananda Theertha Suresh,et al. Distributed Mean Estimation with Limited Communication , 2016, ICML.
[25] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[26] Michael I. Jordan,et al. Less than a Single Pass: Stochastically Controlled Stochastic Gradient , 2016, AISTATS.
[27] Michael I. Jordan,et al. Distributed optimization with arbitrary local solvers , 2015, Optim. Methods Softw..
[28] Sam Ade Jacobs,et al. Communication Quantization for Data-Parallel Training of Deep Neural Networks , 2016, 2016 2nd Workshop on Machine Learning in HPC Environments (MLHPC).
[29] Alexander J. Smola,et al. AIDE: Fast and Communication Efficient Distributed Optimization , 2016, ArXiv.
[30] Amit Agarwal,et al. CNTK: Microsoft's Open-Source Deep-Learning Toolkit , 2016, KDD.
[31] Peter Richtárik,et al. Distributed Coordinate Descent Method for Learning with Big Data , 2013, J. Mach. Learn. Res..
[32] Peter Richtárik,et al. Quartz: Randomized Dual Coordinate Ascent with Arbitrary Sampling , 2015, NIPS.
[33] Aurélien Lucchi,et al. Variance Reduced Stochastic Gradient Descent with Neighbors , 2015, NIPS.
[34] Peter Richtárik,et al. Primal Method for ERM with Flexible Mini-batching Schemes and Non-convex Losses , 2015, ArXiv.
[35] Peter Richtárik,et al. Stochastic Dual Coordinate Ascent with Adaptive Probabilities , 2015, ICML.
[36] Shai Shalev-Shwartz,et al. SDCA without Duality , 2015, ArXiv.
[37] Michael I. Jordan,et al. Adding vs. Averaging in Distributed Primal-Dual Optimization , 2015, ICML.
[38] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[39] Nikko Strom,et al. Scalable distributed DNN training using commodity GPU cloud computing , 2015, INTERSPEECH.
[40] Thomas Hofmann,et al. Communication-Efficient Distributed Dual Coordinate Ascent , 2014, NIPS.
[41] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[42] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.
[43] Ohad Shamir,et al. Communication-Efficient Distributed Optimization using an Approximate Newton-type Method , 2013, ICML.
[44] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[45] Avleen Singh Bijral,et al. Mini-Batch Primal and Dual Methods for SVMs , 2013, ICML.
[46] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..
[47] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.
[48] Ohad Shamir,et al. Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..
[49] Lisandro Dalcin,et al. Parallel distributed computing using Python , 2011 .
[50] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.
[51] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.
[52] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[53] Gideon S. Mann,et al. Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models , 2009, NIPS.
[54] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[55] H. Robbins. A Stochastic Approximation Method , 1951 .
[56] Lawrence G. Roberts,et al. Picture coding using pseudo-random noise , 1962, IRE Trans. Inf. Theory.
[57] W. M. Goodall. Television by pulse code modulation , 1951 .