Minimizing Nonconvex Non-Separable Functions

Regularization has played a key role in deriving sensible estimators in high dimensional statistical inference. A substantial amount of recent works has argued for nonconvex regularizers in favor of their superior theoretical properties and excellent practical performances. In a dierent but analogous vein, nonconvex loss functions are promoted because of their robustness against \outliers". However, these nonconvex formulations are computationally more challenging, especially in the presence of nonsmoothness and nonseparability. To address this issue, we propose a new proximal gradient meta-algorithm by rigorously extending the proximal average to the nonconvex setting. We formally prove its nice convergence properties, and illustrate its eectiveness on two applications: multi-task graph-guided fused lasso and robust support vector machines. Experiments demonstrate that our method compares favorably against other alternatives.

[1]  R. Phelps Convex Functions, Monotone Operators and Differentiability , 1989 .

[2]  Danny Kopec,et al.  Additional References , 2003 .

[3]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[4]  L. Dries,et al.  Geometric categories and o-minimal structures , 1996 .

[5]  J. M. Borwein,et al.  Distinct differentiable functions may share the same Clarke subdifferential at all points | NOVA. The University of Newcastle's Digital Repository , 1997 .

[6]  A. Antoniadis Wavelets in statistics: A review , 1997 .

[7]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[8]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[9]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[10]  Yufeng Liu,et al.  Multicategory ψ-Learning and Support Vector Machine: Computational Tools , 2005 .

[11]  Koby Crammer,et al.  Robust Support Vector Machine Training via Convex Outlier Ablation , 2006, AAAI.

[12]  Jason Weston,et al.  Trading convexity for scalability , 2006, ICML.

[13]  Yufeng Liu,et al.  Robust Truncated Hinge Loss Support Vector Machines , 2007 .

[14]  Adrian S. Lewis,et al.  Clarke Subgradients of Stratifiable Functions , 2006, SIAM J. Optim..

[15]  Anestis Antoniadis,et al.  Wavelet methods in statistics: Some recent developments and their applications , 2007, 0712.0283.

[16]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[17]  T. Blumensath,et al.  Iterative Thresholding for Sparse Approximations , 2008 .

[18]  Rocco A. Servedio,et al.  Random classification noise defeats all convex potential boosters , 2008, ICML '08.

[19]  Y. She,et al.  Thresholding-based iterative selection procedures for model selection and shrinkage , 2008, 0812.5061.

[20]  E. Xing,et al.  Statistical Estimation of Correlated Genome Associations to a Quantitative Trait Network , 2009, PLoS genetics.

[21]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[22]  Warren Hare,et al.  A Proximal Average for Nonconvex Functions: A Proximal Stability Perspective , 2009, SIAM J. Optim..

[23]  Stéphane Canu,et al.  Recovering Sparse Signals With a Certain Family of Nonconvex Penalties and DC Programming , 2009, IEEE Transactions on Signal Processing.

[24]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[25]  Tong Zhang,et al.  A General Theory of Concave Regularization for High-Dimensional Sparse Estimation Problems , 2011, 1108.4988.

[26]  T. Hastie,et al.  SparseNet: Coordinate Descent With Nonconvex Penalties , 2011, Journal of the American Statistical Association.

[27]  Yaoliang Yu,et al.  A Polynomial-time Form of Robust Regression , 2012, NIPS.

[28]  Rick Chartrand,et al.  Nonconvex Splitting for Regularized Low-Rank + Sparse Decomposition , 2012, IEEE Transactions on Signal Processing.

[29]  Xiaotong Shen,et al.  Simultaneous Grouping Pursuit and Feature Selection Over an Undirected Graph , 2013, Journal of the American Statistical Association.

[30]  Jieping Ye,et al.  A General Iterative Shrinkage and Thresholding Algorithm for Non-convex Regularized Optimization Problems , 2013, ICML.

[31]  Yurii Nesterov,et al.  Gradient methods for minimizing composite functions , 2012, Mathematical Programming.

[32]  Yaoliang Yu,et al.  Better Approximation and Faster Algorithm Using the Proximal Average , 2013, NIPS.

[33]  Zhaoran Wang,et al.  OPTIMAL COMPUTATIONAL AND STATISTICAL RATES OF CONVERGENCE FOR SPARSE NONCONVEX LEARNING PROBLEMS. , 2013, Annals of statistics.

[34]  Bastian Goldlücke,et al.  Variational Analysis , 2014, Computer Vision, A Reference Guide.

[35]  Marc Teboulle,et al.  Proximal alternating linearized minimization for nonconvex and nonsmooth problems , 2013, Mathematical Programming.