Invexity Preserving Transformations for Projection Free Optimization with Sparsity Inducing Non-convex Constraints

Forward stagewise and Frank Wolfe are popular gradient based projection free optimization algorithms which both require convex constraints. We propose a method to extend the applicability of these algorithms to problems of the form \(\min _x f(x) \quad s.t. \quad g(x) \le \kappa \) where f(x) is an invex (Invexity is a generalization of convexity and ensures that all local optima are also global optima.) objective function and g(x) is a non-convex constraint. We provide a theorem which defines a class of monotone component-wise transformation functions \(x_i = h(z_i)\). These transformations lead to a convex constraint function \(G(z) = g(h(z))\). Assuming invexity of the original function f(x) that same transformation \(x_i = h(z_i)\) will lead to a transformed objective function \(F(z) = f(h(z))\) which is also invex. For algorithms that rely on a non-zero gradient \(\nabla F\) to produce new update steps invexity ensures that these algorithms will move forward as long as a descent direction exists.

[1]  Gal Chechik,et al.  Information Bottleneck for Gaussian Variables , 2003, J. Mach. Learn. Res..

[2]  Ryan J. Tibshirani,et al.  A general framework for fast stagewise algorithms , 2014, J. Mach. Learn. Res..

[3]  J. Friedman Fast sparse regression and classification , 2012 .

[4]  T. Hastie,et al.  SparseNet: Coordinate Descent With Nonconvex Penalties , 2011, Journal of the American Statistical Association.

[5]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[6]  Jun Wang,et al.  A One-Layer Recurrent Neural Network for Constrained Nonsmooth Optimization , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[7]  Bhaskar D. Rao,et al.  Sparse signal reconstruction from limited data using FOCUSS: a re-weighted minimum norm algorithm , 1997, IEEE Trans. Signal Process..

[8]  Adi Ben-Israel,et al.  What is invexity? , 1986, The Journal of the Australian Mathematical Society. Series B. Applied Mathematics.

[9]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[10]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[11]  A. Rakotomamonjy,et al.  Solving non-convex lasso type problems with DC programming , 2008, 2008 IEEE Workshop on Machine Learning for Signal Processing.

[12]  Volker Roth,et al.  Meta-Gaussian Information Bottleneck , 2012, NIPS.

[13]  G. Giorgi On First Order Sufficient Conditions for Constrained Optima , 1995 .

[14]  Serena Morigi,et al.  Convex Image Denoising via Non-Convex Regularization , 2015, SSVM.

[15]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[16]  R. Tibshirani,et al.  Forward stagewise regression and the monotone lasso , 2007, 0705.0269.

[17]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[18]  Peter V. Gehler,et al.  Learning Output Kernels with Block Coordinate Descent , 2011, ICML.

[19]  Volker Roth,et al.  Sparse meta-Gaussian information bottleneck , 2014, ICML.