Solving Non-smooth Constrained Programs with Lower Complexity than \mathcal{O}(1/\varepsilon): A Primal-Dual Homotopy Smoothing Approach

We propose a new primal-dual homotopy smoothing algorithm for a linearly constrained convex program, where neither the primal nor the dual function has to be smooth or strongly convex. The best known iteration complexity solving such a non-smooth problem is $\mathcal{O}(\varepsilon^{-1})$. In this paper, we show that by leveraging a local error bound condition on the dual function, the proposed algorithm can achieve a better primal convergence time of $\mathcal{O}\l(\varepsilon^{-2/(2+\beta)}\log_2(\varepsilon^{-1})\r)$, where $\beta\in(0,1]$ is a local error bound parameter. As an example application, we show that the distributed geometric median problem, which can be formulated as a constrained convex program, has its dual function non-smooth but satisfying the aforementioned local error bound condition with $\beta=1/2$, therefore enjoying a convergence time of $\mathcal{O}\l(\varepsilon^{-4/5}\log_2(\varepsilon^{-1})\r)$. This result improves upon the $\mathcal{O}(\varepsilon^{-1})$ convergence time bound achieved by existing distributed optimization algorithms. Simulation experiments also demonstrate the performance of our proposed algorithm.

[1]  Kannan Ramchandran,et al.  Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates , 2018, ICML.

[2]  Hao Yu,et al.  A New Backpressure Algorithm for Joint Rate Control and Routing With Vanishing Utility Optimality Gaps and Finite Queue Lengths , 2018, IEEE/ACM Transactions on Networking.

[3]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[4]  Volkan Cevher,et al.  A Smooth Primal-Dual Optimization Framework for Nonsmooth Composite Convex Minimization , 2015, SIAM J. Optim..

[5]  Xiaohan Wei,et al.  A Probabilistic Sample Path Convergence Time Analysis of Drift-Plus-Penalty Algorithm for Stochastic Optimization , 2015, 1510.02973.

[6]  Nate Strawn,et al.  Distributed Statistical Estimation and Rates of Convergence in Normal Approximation , 2017, Electronic Journal of Statistics.

[7]  Zhi-Quan Luo,et al.  Extension of Hoffman's Error Bound to Polynomial Systems , 1994, SIAM J. Optim..

[8]  Lin Xiao,et al.  A Proximal-Gradient Homotopy Method for the Sparse Least-Squares Problem , 2012, SIAM J. Optim..

[9]  Tianbao Yang,et al.  RSG: Beating Subgradient Method without Smoothness and Strong Convexity , 2015, J. Mach. Learn. Res..

[10]  Qing Ling,et al.  On the Linear Convergence of the ADMM in Decentralized Consensus Optimization , 2013, IEEE Transactions on Signal Processing.

[11]  Panos M. Pardalos,et al.  Convex optimization theory , 2010, Optim. Methods Softw..

[12]  J. Pang,et al.  Global error bounds for convex quadratic inequality systems , 1994 .

[13]  Renato D. C. Monteiro,et al.  Iteration-complexity of first-order penalty methods for convex programming , 2013, Math. Program..

[14]  Jong-Shi Pang,et al.  Error bounds in mathematical programming , 1997, Math. Program..

[15]  Martin J. Wainwright,et al.  Optimality guarantees for distributed statistical estimation , 2014, 1405.0782.

[16]  Defeng Sun,et al.  Linear Rate Convergence of the Alternating Direction Method of Multipliers for Convex Composite Quadratic and Semi-Definite Programming , 2015, 1508.02134.

[17]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[18]  James V. Burke,et al.  A Unified Analysis of Hoffman's Bound via Fenchel Duality , 1996, SIAM J. Optim..

[19]  Pablo A. Parrilo,et al.  Minimizing Polynomial Functions , 2001, Algorithmic and Quantitative Aspects of Real Algebraic Geometry in Mathematics and Computer Science.

[20]  Yinyu Ye,et al.  An Efficient Algorithm for Minimizing a Sum of Euclidean Norms with Applications , 1997, SIAM J. Optim..

[21]  Wotao Yin,et al.  On the Global and Linear Convergence of the Generalized Alternating Direction Method of Multipliers , 2016, J. Sci. Comput..

[22]  Stephen P. Boyd,et al.  Fastest Mixing Markov Chain on a Graph , 2004, SIAM Rev..

[23]  Marc Teboulle,et al.  An $O(1/k)$ Gradient Method for Network Resource Allocation Problems , 2014, IEEE Transactions on Control of Network Systems.

[24]  Mingrui Liu,et al.  ADMM without a Fixed Penalty Parameter: Faster Convergence with New Adaptive Penalization , 2017, NIPS.

[25]  Volkan Cevher,et al.  A Conditional Gradient Framework for Composite Convex Minimization with Applications to Semidefinite Programming , 2018, ICML.

[26]  Qing Ling,et al.  A Proximal Gradient Algorithm for Decentralized Composite Optimization , 2015, IEEE Transactions on Signal Processing.

[27]  David B. Dunson,et al.  Robust and Scalable Bayes via a Median of Subset Posterior Measures , 2014, J. Mach. Learn. Res..

[28]  Johan A. K. Suykens,et al.  Application of a Smoothing Technique to Decomposition in Convex Optimization , 2008, IEEE Transactions on Automatic Control.

[29]  Hao Yu,et al.  A Simple Parallel Algorithm with an O(1/t) Convergence Rate for General Convex Programs , 2015, SIAM J. Optim..

[30]  Xiaohan Wei,et al.  Primal-Dual Frank-Wolfe for Constrained Stochastic Programs with Convex and Non-convex Objectives , 2018 .

[31]  Frank Plastria,et al.  On the point for which the sum of the distances to n given points is minimum , 2009, Ann. Oper. Res..

[32]  Volkan Cevher,et al.  A Universal Primal-Dual Convex Optimization Framework , 2015, NIPS.

[33]  Jakub W. Pachocki,et al.  Geometric median in nearly linear time , 2016, STOC.

[34]  Yurii Nesterov,et al.  Universal gradient methods for convex optimization problems , 2015, Math. Program..

[35]  Qing Ling,et al.  Decentralized Sparse Signal Recovery for Compressive Sleeping Wireless Sensor Networks , 2010, IEEE Transactions on Signal Processing.

[36]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[37]  Zhao Yang Dong,et al.  A fast dual proximal-gradient method for separable convex optimization with linear coupled constraints , 2016, Comput. Optim. Appl..

[38]  Qing Ling,et al.  On the Convergence of Decentralized Gradient Descent , 2013, SIAM J. Optim..

[39]  Yurii Nesterov,et al.  Complexity bounds for primal-dual methods minimizing the model of objective function , 2017, Mathematical Programming.

[40]  Hao Yu,et al.  On the Convergence Time of Dual Subgradient Methods for Strongly Convex Programs , 2015, IEEE Transactions on Automatic Control.

[41]  Paul Tseng,et al.  Approximation accuracy, gradient methods, and error bound for structured convex optimization , 2010, Math. Program..

[42]  Gauthier Gidel,et al.  Frank-Wolfe Splitting via Augmented Lagrangian Method , 2018, AISTATS.

[43]  Wotao Yin,et al.  Parallel Multi-Block ADMM with o(1 / k) Convergence , 2013, Journal of Scientific Computing.

[44]  Tianbao Yang,et al.  Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/\epsilon) , 2016, NIPS.

[45]  M. R. Osborne,et al.  A new approach to variable selection in least squares problems , 2000 .