LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed Learning
暂无分享,去创建一个
Georgios B. Giannakis | Wotao Yin | Tao Sun | Tianyi Chen | G. Giannakis | W. Yin | Tianyi Chen | Tao Sun
[1] Kevin Baker,et al. Classification of radar returns from the ionosphere using neural networks , 1989 .
[2] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[3] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[4] Le Song,et al. Supervised feature selection via dependence estimation , 2007, ICML '07.
[5] Asuman E. Ozdaglar,et al. Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.
[6] Ming Yan,et al. ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates , 2015, SIAM J. Sci. Comput..
[7] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.
[8] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.
[9] Qing Ling,et al. Decentralized learning for wireless communications and networking , 2015, ArXiv.
[10] Ohad Shamir,et al. Communication-Efficient Distributed Optimization using an Approximate Newton-type Method , 2013, ICML.
[11] Ameet Talwalkar,et al. Federated Multi-Task Learning , 2017, NIPS.
[12] Damek Davis,et al. Convergence Rate Analysis of Several Splitting Schemes , 2014, 1406.4834.
[13] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[14] Alexander J. Smola,et al. Communication Efficient Distributed Machine Learning with the Parameter Server , 2014, NIPS.
[15] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.
[16] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[17] Yi Zhou,et al. Communication-efficient algorithms for decentralized and stochastic optimization , 2017, Mathematical Programming.
[18] H. Robbins. A Stochastic Approximation Method , 1951 .
[19] Alejandro Ribeiro,et al. Consensus in Ad Hoc WSNs With Noisy Links—Part I: Distributed Estimation of Deterministic Signals , 2008, IEEE Transactions on Signal Processing.
[20] Thomas Hofmann,et al. Communication-Efficient Distributed Dual Coordinate Ascent , 2014, NIPS.
[21] Alfred O. Hero,et al. A Convergent Incremental Gradient Method with a Constant Step Size , 2007, SIAM J. Optim..
[22] Asuman E. Ozdaglar,et al. On the Convergence Rate of Incremental Aggregated Gradient Algorithms , 2015, SIAM J. Optim..
[23] Yuchen Zhang,et al. DiSCO: Distributed Optimization for Self-Concordant Empirical Loss , 2015, ICML.
[24] Yun Yang,et al. Communication-Efficient Distributed Statistical Inference , 2016, Journal of the American Statistical Association.
[25] H. Altay Güvenir,et al. Learning differential diagnosis of erythemato-squamous diseases using voting feature intervals , 1998, Artif. Intell. Medicine.
[26] Randy H. Katz,et al. A Berkeley View of Systems Challenges for AI , 2017, ArXiv.
[27] Qing Ling,et al. Asynchronous periodic event-triggered coordination of multi-agent systems , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).
[28] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[29] Ron Kohavi,et al. Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.
[30] Yijun Huang,et al. Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization , 2015, NIPS.
[31] D. Rubinfeld,et al. Hedonic housing prices and the demand for clean air , 1978 .
[32] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[33] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[34] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[35] Ananda Theertha Suresh,et al. Distributed Mean Estimation with Limited Communication , 2016, ICML.
[36] Mark W. Schmidt,et al. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.
[37] Martin J. Wainwright,et al. Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[38] Francisco Facchinei,et al. Asynchronous Parallel Algorithms for Nonconvex Big-Data Optimization: Model and Convergence , 2016, ArXiv.
[39] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[40] Kenneth Heafield,et al. Sparse Communication for Distributed Gradient Descent , 2017, EMNLP.
[41] Xin Wang,et al. Learning and Management for Internet of Things: Accounting for Adaptivity and Scalability , 2018, Proceedings of the IEEE.
[42] Michael I. Jordan,et al. Distributed optimization with arbitrary local solvers , 2015, Optim. Methods Softw..
[43] Wotao Yin,et al. Asynchronous Coordinate Descent under More Realistic Assumptions , 2017, NIPS.
[44] Stephen J. Wright,et al. An asynchronous parallel stochastic coordinate descent algorithm , 2013, J. Mach. Learn. Res..