Global Error Bounds and Linear Convergence for Gradient-Based Algorithms for Trend Filtering and 𝓁1-Convex Clustering

We propose a class of first-order gradient-type optimization algorithms to solve structured \textit{filtering-clustering problems}, a class of problems which include trend filtering and $\ell_1$-convex clustering as special cases. Our first main result establishes the linear convergence of deterministic gradient-type algorithms despite the extreme ill-conditioning of the difference operator matrices in these problems. This convergence result is based on a convex-concave saddle point formulation of filtering-clustering problems and the fact that the dual form of the problem admits a global error bound, a result which is based on the celebrated Hoffman bound for the distance between a point and its projection onto an optimal set. The linear convergence rate also holds for stochastic variance reduction gradient-type algorithms. Finally, we present empirical results to show that the algorithms that we analyze perform comparable to state-of-the-art algorithms for trend filtering, while presenting advantages for scalability.

[1]  C. Leser A Simple Method of Trend Construction , 1961 .

[2]  D. Bertsekas,et al.  TWO-METRIC PROJECTION METHODS FOR CONSTRAINED OPTIMIZATION* , 1984 .

[3]  Jong-Shi Pang,et al.  A Posteriori Error Bounds for the Linearly-Constrained Variational Inequality Problem , 1987, Math. Oper. Res..

[4]  P. Tseng,et al.  On the linear convergence of descent methods for convex essentially smooth minimization , 1992 .

[5]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[6]  Z.-Q. Luo,et al.  Error bounds and convergence analysis of feasible descent methods: a general approach , 1993, Ann. Oper. Res..

[7]  R. Tyrrell Rockafellar,et al.  Convergence Rates in Forward-Backward Splitting , 1997, SIAM J. Optim..

[8]  F. Facchinei,et al.  Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .

[9]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[10]  Roger Fletcher,et al.  On the Barzilai-Borwein Method , 2005 .

[11]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[12]  R. Tibshirani,et al.  The solution path of the generalized lasso , 2010, 1005.1971.

[13]  Francis R. Bach,et al.  Clusterpath: an Algorithm for Clustering using Convex Fusion Penalties , 2011, ICML.

[14]  Ohad Shamir,et al.  Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.

[15]  Martin J. Wainwright,et al.  Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization , 2010, IEEE Transactions on Information Theory.

[16]  R. Tibshirani Adaptive piecewise polynomial estimation via trend filtering , 2013, 1304.2986.

[17]  Chih-Jen Lin,et al.  Iteration complexity of feasible descent methods for convex optimization , 2014, J. Mach. Learn. Res..

[18]  Ryan J. Tibshirani,et al.  Fast and Flexible ADMM Algorithms for Trend Filtering , 2014, ArXiv.

[19]  Shuicheng Yan,et al.  Convex Optimization Procedure for Clustering: Theoretical Revisit , 2014, NIPS.

[20]  Kean Ming Tan,et al.  Statistical properties of convex clustering. , 2015, Electronic journal of statistics.

[21]  Eric C. Chi,et al.  Splitting Methods for Convex Clustering , 2013, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[22]  Yuchen Zhang,et al.  Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.

[23]  Eter,et al.  Convex clustering via `1 fusion penalization , 2016 .

[24]  Nathan Srebro,et al.  Tight Complexity Bounds for Optimizing Composite Objectives , 2016, NIPS.

[25]  Francis R. Bach,et al.  Stochastic Variance Reduction Methods for Saddle-Point Problems , 2016, NIPS.

[26]  Wei Pan,et al.  A New Algorithm and Theory for Penalized Regression-based Clustering , 2016, J. Mach. Learn. Res..

[27]  Thomas S. Huang,et al.  Robust Convex Clustering Analysis , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[28]  ASHISH CHERUKURI,et al.  Saddle-Point Dynamics: Conditions for Asymptotic Stability of Saddle Points , 2015, SIAM J. Control. Optim..

[29]  James G. Scott,et al.  The DFS Fused Lasso: Linear-Time Denoising over General Graphs , 2016, J. Mach. Learn. Res..

[30]  Shimrit Shtern,et al.  Linearly convergent away-step conditional gradient for non-strongly convex functions , 2015, Mathematical Programming.

[31]  Zeyuan Allen Zhu,et al.  Katyusha: the first direct acceleration of stochastic gradient methods , 2017, STOC.

[32]  Alessandro Rinaldo,et al.  A Sharp Error Analysis for the Fused Lasso, with Application to Approximate Changepoint Screening , 2017, NIPS.

[33]  Lin Xiao,et al.  Exploiting Strong Convexity from Data with Primal-Dual First-Order Algorithms , 2017, ICML.

[34]  Yi Zhou,et al.  An optimal randomized incremental gradient method , 2015, Mathematical Programming.

[35]  Kim-Chuan Toh,et al.  An Efficient Semismooth Newton Based Algorithm for Convex Clustering , 2018, ICML.

[36]  James Sharpnack,et al.  Adaptive Non-Parametric Regression With the $K$-NN Fused Lasso , 2018, 1807.11641.

[37]  Dmitriy Drusvyatskiy,et al.  Error Bounds, Quadratic Growth, and Linear Convergence of Proximal Methods , 2016, Math. Oper. Res..

[38]  Wei Hu,et al.  Linear Convergence of the Primal-Dual Gradient Method for Convex-Concave Saddle Point Problems without Strong Convexity , 2018, AISTATS.

[39]  Anthony Man-Cho So,et al.  A family of inexact SQA methods for non-smooth convex minimization with provable convergence guarantees based on the Luo–Tseng error bound property , 2016, Math. Program..