Damping proximal coordinate descent algorithm for non-convex regularization

Non-convex regularization has attracted much attention in the fields of machine learning, since it is unbiased and improves the performance on many applications compared with the convex counterparts. The optimization is important but difficult for non-convex regularization. In this paper, we propose the Damping Proximal Coordinate Descent (DPCD) algorithms that address the optimization issues of a general family of non-convex regularized problems. DPCD is guaranteed to be globally convergent. The computational complexity of obtaining an approximately stationary solution with a desired precision is only linear to the data size. Our experiments on many machine learning benchmark datasets also show that DPCD has a fast convergence rate and it reduces the time of training models without significant loss of prediction accuracy.

[1]  Changshui Zhang,et al.  Relaxed sparse eigenvalue conditions for sparse estimation via non-convex regularized regression , 2013, Pattern Recognit..

[2]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[3]  David J. Field,et al.  What Is the Goal of Sensory Coding? , 1994, Neural Computation.

[4]  Jieping Ye,et al.  A General Iterative Shrinkage and Thresholding Algorithm for Non-convex Regularized Optimization Problems , 2013, ICML.

[5]  D. Hunter,et al.  A Tutorial on MM Algorithms , 2004 .

[6]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[7]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[8]  Alan L. Yuille,et al.  The Concave-Convex Procedure (CCCP) , 2001, NIPS.

[9]  Xiaotong Shen,et al.  Journal of the American Statistical Association Likelihood-based Selection and Sharp Parameter Estimation Likelihood-based Selection and Sharp Parameter Estimation , 2022 .

[10]  I. Daubechies,et al.  Iteratively reweighted least squares minimization for sparse recovery , 2008, 0807.0575.

[11]  Jianqing Fan,et al.  Nonconcave penalized likelihood with a diverging number of parameters , 2004, math/0406466.

[12]  Stephen J. Wright,et al.  Sparse reconstruction by separable approximation , 2009, IEEE Trans. Signal Process..

[13]  S. Foucart,et al.  Sparsest solutions of underdetermined linear systems via ℓq-minimization for 0 , 2009 .

[14]  Tong Zhang,et al.  Analysis of Multi-stage Convex Relaxation for Sparse Regularization , 2010, J. Mach. Learn. Res..

[15]  Edward H. Adelson,et al.  Noise removal via Bayesian wavelet coring , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[16]  Zhihua Zhang,et al.  A non-convex relaxation approach to sparse dictionary learning , 2011, CVPR 2011.

[17]  J. Borwein,et al.  Two-Point Step Size Gradient Methods , 1988 .

[18]  Frédo Durand,et al.  Image and depth from a conventional camera with a coded aperture , 2007, SIGGRAPH 2007.

[19]  Guillermo Sapiro,et al.  Robust anisotropic diffusion , 1998, IEEE Trans. Image Process..

[20]  Tong Zhang,et al.  A General Theory of Concave Regularization for High-Dimensional Sparse Estimation Problems , 2011, 1108.4988.

[21]  Changshui Zhang,et al.  High-dimensional Inference via Lipschitz Sparsity-Yielding Regularizers , 2013, AISTATS.

[22]  Chih-Jen Lin,et al.  Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines , 2008, J. Mach. Learn. Res..

[23]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[24]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[25]  Jian Huang,et al.  COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION. , 2011, The annals of applied statistics.

[26]  T. Hastie,et al.  SparseNet: Coordinate Descent With Nonconvex Penalties , 2011, Journal of the American Statistical Association.

[27]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[28]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[29]  Stéphane Canu,et al.  Recovering Sparse Signals With a Certain Family of Nonconvex Penalties and DC Programming , 2009, IEEE Transactions on Signal Processing.