DSCOVR: Randomized Primal-Dual Block Coordinate Algorithms for Asynchronous Distributed Optimization
暂无分享,去创建一个
Weizhu Chen | Lin Xiao | Adams Wei Yu | Qihang Lin | Lin Xiao | Qihang Lin | Weizhu Chen | A. Yu
[1] Antonin Chambolle,et al. A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.
[2] Alexander J. Smola,et al. AIDE: Fast and Communication Efficient Distributed Optimization , 2016, ArXiv.
[3] Chih-Jen Lin,et al. Limited-memory Common-directions Method for Distributed Optimization and its Application on Empirical Risk Minimization , 2017, SDM.
[4] Zaïd Harchaoui,et al. A Universal Catalyst for First-Order Optimization , 2015, NIPS.
[5] Michael I. Jordan,et al. Adding vs. Averaging in Distributed Primal-Dual Optimization , 2015, ICML.
[6] Antonin Chambolle,et al. On the ergodic convergence rates of a first-order primal–dual algorithm , 2016, Math. Program..
[7] Thomas Hofmann,et al. Communication-Efficient Distributed Dual Coordinate Ascent , 2014, NIPS.
[8] John C. Duchi,et al. Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[9] Matthew J. Streeter,et al. Delay-Tolerant Algorithms for Asynchronous Distributed Online Learning , 2014, NIPS.
[10] Asuman E. Ozdaglar,et al. Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.
[11] Tianbao Yang,et al. Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent , 2013, NIPS.
[12] Tianbao Yang,et al. Doubly Stochastic Primal-Dual Coordinate Method for Bilinear Saddle-Point Problem , 2015, 1508.03390.
[13] Alexander J. Smola,et al. On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants , 2015, NIPS.
[14] Tianbao Yang,et al. Distributed Stochastic Variance Reduced Gradient Methods and A Lower Bound for Communication Complexity , 2015 .
[15] Lin Xiao,et al. Exploiting Strong Convexity from Data with Primal-Dual First-Order Algorithms , 2017, ICML.
[16] Marc Teboulle,et al. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..
[17] Laurent Massoulié,et al. Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks , 2017, ICML.
[18] Inderjit S. Dhillon,et al. NOMAD: Nonlocking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix completion , 2013, Proc. VLDB Endow..
[19] Hamid Reza Feyzmahdavian,et al. Analysis and Implementation of an Asynchronous Optimization Algorithm for the Parameter Server , 2016, ArXiv.
[20] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..
[21] Yi Zhou,et al. An optimal randomized incremental gradient method , 2015, Mathematical Programming.
[22] Yurii Nesterov,et al. Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..
[23] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[24] Cheng-Hao Tsai,et al. Large-scale logistic regression and linear support vector machines using spark , 2014, 2014 IEEE International Conference on Big Data (Big Data).
[25] R. Tyrrell Rockafellar,et al. Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.
[26] Yurii Nesterov,et al. Gradient methods for minimizing composite functions , 2012, Mathematical Programming.
[27] Stephen P. Boyd,et al. A Primer on Monotone Operator Methods , 2015 .
[28] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[29] Peter Richtárik,et al. Parallel coordinate descent methods for big data optimization , 2012, Mathematical Programming.
[30] Yuchen Zhang,et al. DiSCO: Distributed Optimization for Self-Concordant Empirical Loss , 2015, ICML.
[31] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[32] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[33] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.
[34] Alexander J. Smola,et al. AdaDelay: Delay Adaptive Distributed Stochastic Optimization , 2016, AISTATS.
[35] Ming Yan,et al. ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates , 2015, SIAM J. Sci. Comput..
[36] R. Rockafellar. Monotone Operators and the Proximal Point Algorithm , 1976 .
[37] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[38] Stephen P. Boyd,et al. Optimal Scaling of a Gradient Method for Distributed Resource Allocation , 2006 .
[39] Ohad Shamir,et al. Communication-Efficient Distributed Optimization using an Approximate Newton-type Method , 2013, ICML.
[40] Yuchen Zhang,et al. Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.
[41] Michael I. Jordan,et al. Distributed optimization with arbitrary local solvers , 2015, Optim. Methods Softw..
[42] Lin Xiao,et al. A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..
[43] Weizhu Chen,et al. Large-scale L-BFGS using MapReduce , 2014, NIPS.
[44] Sham M. Kakade,et al. Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization , 2015, ICML.
[45] Martin J. Wainwright,et al. Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[46] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.
[47] S. V. N. Vishwanathan,et al. Distributed Stochastic Optimization of the Regularized Risk , 2014, ArXiv.
[48] Francis R. Bach,et al. Stochastic Variance Reduction Methods for Saddle-Point Problems , 2016, NIPS.
[49] Zeyuan Allen-Zhu,et al. Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..
[50] Lin Xiao,et al. An adaptive accelerated proximal gradient method and its homotopy continuation for sparse optimization , 2014, Computational Optimization and Applications.
[51] Stephen J. Wright,et al. An asynchronous parallel stochastic coordinate descent algorithm , 2013, J. Mach. Learn. Res..
[52] Qing Ling,et al. EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.
[53] Tianbao Yang,et al. Stochastic Variance Reduced Gradient Methods by Sampling Extra Data with Replacement , 2017 .
[54] Wei Shi,et al. Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs , 2016, SIAM J. Optim..
[55] Martin J. Wainwright,et al. Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.
[56] Peter Richtárik,et al. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.
[57] Ohad Shamir,et al. Communication Complexity of Distributed Convex Learning and Optimization , 2015, NIPS.
[58] Wotao Yin,et al. More Iterations per Second, Same Quality - Why Asynchronous Algorithms may Drastically Outperform Traditional Ones , 2017, ArXiv.
[59] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[60] Ameet Talwalkar,et al. MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..
[61] Yaoliang Yu,et al. Petuum: A New Platform for Distributed Machine Learning on Big Data , 2013, IEEE Transactions on Big Data.
[62] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..