On Structured Filtering-Clustering: Global Error Bound and Optimal First-Order Algorithms

In recent years, the filtering-clustering problems have been a central topic in statistics and machine learning, especially the $\ell_1$-trend filtering and $\ell_2$-convex clustering problems. In practice, such structured problems are typically solved by first-order algorithms despite the extremely ill-conditioned structures of difference operator matrices. Inspired by the desire to analyze the convergence rates of these algorithms, we show that for a large class of filtering-clustering problems, a \textit{global error bound} condition is satisfied for the dual filtering-clustering problems when a certain regularization is chosen. Based on this result, we show that many first-order algorithms attain the \textit{optimal rate of convergence} in different settings. In particular, we establish a generalized dual gradient ascent (GDGA) algorithmic framework with several subroutines. In deterministic setting when the subroutine is accelerated gradient descent (AGD), the resulting algorithm attains the linear convergence. This linear convergence also holds for the finite-sum setting in which the subroutine is the Katyusha algorithm. We also demonstrate that the GDGA with stochastic gradient descent (SGD) subroutine attains the optimal rate of convergence up to the logarithmic factor, shedding the light to the possibility of solving the filtering-clustering problems efficiently in online setting. Experiments conducted on $\ell_1$-trend filtering problems illustrate the favorable performance of our algorithms over other competing algorithms.

[1]  C. Leser A Simple Method of Trend Construction , 1961 .

[2]  Kean Ming Tan,et al.  Statistical properties of convex clustering. , 2015, Electronic journal of statistics.

[3]  Eric C. Chi,et al.  Splitting Methods for Convex Clustering , 2013, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[4]  Francis R. Bach,et al.  Stochastic Variance Reduction Methods for Saddle-Point Problems , 2016, NIPS.

[5]  Ryan J. Tibshirani,et al.  Fast and Flexible ADMM Algorithms for Trend Filtering , 2014, ArXiv.

[6]  Wei Hu,et al.  Linear Convergence of the Primal-Dual Gradient Method for Convex-Concave Saddle Point Problems without Strong Convexity , 2018, AISTATS.

[7]  ASHISH CHERUKURI,et al.  Saddle-Point Dynamics: Conditions for Asymptotic Stability of Saddle Points , 2015, SIAM J. Control. Optim..

[8]  Stephen P. Boyd,et al.  1 Trend Filtering , 2009, SIAM Rev..

[9]  Zeyuan Allen Zhu,et al.  Katyusha: the first direct acceleration of stochastic gradient methods , 2017, STOC.

[10]  Jong-Shi Pang,et al.  A Posteriori Error Bounds for the Linearly-Constrained Variational Inequality Problem , 1987, Math. Oper. Res..

[11]  Roger Fletcher,et al.  On the Barzilai-Borwein Method , 2005 .

[12]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[13]  Wei Pan,et al.  A New Algorithm and Theory for Penalized Regression-based Clustering , 2016, J. Mach. Learn. Res..

[14]  Chih-Jen Lin,et al.  Iteration complexity of feasible descent methods for convex optimization , 2014, J. Mach. Learn. Res..

[15]  Anthony Man-Cho So,et al.  A family of inexact SQA methods for non-smooth convex minimization with provable convergence guarantees based on the Luo–Tseng error bound property , 2016, Math. Program..

[16]  D. Bertsekas,et al.  TWO-METRIC PROJECTION METHODS FOR CONSTRAINED OPTIMIZATION* , 1984 .

[17]  P. Tseng,et al.  On the linear convergence of descent methods for convex essentially smooth minimization , 1992 .

[18]  R. Tyrrell Rockafellar,et al.  Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.

[19]  Heinz H. Bauschke,et al.  On Projection Algorithms for Solving Convex Feasibility Problems , 1996, SIAM Rev..

[20]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[21]  Dmitriy Drusvyatskiy,et al.  Error Bounds, Quadratic Growth, and Linear Convergence of Proximal Methods , 2016, Math. Oper. Res..

[22]  Shimrit Shtern,et al.  Linearly convergent away-step conditional gradient for non-strongly convex functions , 2015, Mathematical Programming.

[23]  James Sharpnack,et al.  Adaptive Non-Parametric Regression With the $K$-NN Fused Lasso , 2018, 1807.11641.

[24]  Alessandro Rinaldo,et al.  A Sharp Error Analysis for the Fused Lasso, with Application to Approximate Changepoint Screening , 2017, NIPS.

[25]  Lin Xiao,et al.  Exploiting Strong Convexity from Data with Primal-Dual First-Order Algorithms , 2017, ICML.

[26]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[27]  ABDERRAHIM JOURANI,et al.  Hoffman's Error Bound, Local Controllability, and Sensitivity Analysis , 2000, SIAM J. Control. Optim..

[28]  Eter,et al.  Convex clustering via `1 fusion penalization , 2016 .

[29]  Yi Zhou,et al.  An optimal randomized incremental gradient method , 2015, Mathematical Programming.

[30]  R. Tibshirani Adaptive piecewise polynomial estimation via trend filtering , 2013, 1304.2986.

[31]  P. Radchenko,et al.  Consistent clustering using l 1 fusion penalty , 2014 .

[32]  Alexander J. Smola,et al.  Trend Filtering on Graphs , 2014, J. Mach. Learn. Res..

[33]  Francis R. Bach,et al.  Clusterpath: an Algorithm for Clustering using Convex Fusion Penalties , 2011, ICML.

[34]  Paul Tseng,et al.  Approximation accuracy, gradient methods, and error bound for structured convex optimization , 2010, Math. Program..

[35]  James G. Scott,et al.  The DFS Fused Lasso: Linear-Time Denoising over General Graphs , 2016, J. Mach. Learn. Res..

[36]  Kim-Chuan Toh,et al.  An Efficient Semismooth Newton Based Algorithm for Convex Clustering , 2018, ICML.

[37]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[38]  Ohad Shamir,et al.  Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.

[39]  Yuchen Zhang,et al.  Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.

[40]  Shuicheng Yan,et al.  Convex Optimization Procedure for Clustering: Theoretical Revisit , 2014, NIPS.

[41]  Martin J. Wainwright,et al.  Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization , 2010, IEEE Transactions on Information Theory.

[42]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[43]  R. Tyrrell Rockafellar,et al.  Convergence Rates in Forward-Backward Splitting , 1997, SIAM J. Optim..

[44]  Jiayu Zhou,et al.  Robust Convex Clustering Analysis , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[45]  R. Tibshirani,et al.  The solution path of the generalized lasso , 2010, 1005.1971.

[46]  Abderrahim Jourani,et al.  Erratum: Hoffman's Error Bound, Local Controllability, and Sensitivity Analysis , 2000, SIAM J. Control. Optim..

[47]  Z.-Q. Luo,et al.  Error bounds and convergence analysis of feasible descent methods: a general approach , 1993, Ann. Oper. Res..