L1-2 Regularized Logistic Regression

Logistic regression has become a fundamental tool to facilitate data analysis and prediction in a variety of applications, including health care and social sciences. Depending on different sparsity assumptions, logistic regression models often incorporate various regularizations, including ℓ1-norm, ℓ2-norm and some non-convex regularizations. In this paper, we propose a nonconvex ℓ1-2 -regularized logistic regression model assuming that the coefficients to be recovered are highly sparse. We derive two numerical algorithms with guaranteed convergence based on the alternating direction method of multipliers and the proximal operator of ℓ1-2. Numerical experiments on real data demonstrate the great potential of the proposed approach.

[1]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[2]  Ming Yan,et al.  Fast L1–L2 Minimization via a Proximal Operator , 2016, Journal of Scientific Computing.

[3]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[4]  R. Glowinski,et al.  Sur l'approximation, par éléments finis d'ordre un, et la résolution, par pénalisation-dualité d'une classe de problèmes de Dirichlet non linéaires , 1975 .

[5]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[6]  P. Green Iteratively reweighted least squares for maximum likelihood estimation , 1984 .

[7]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[8]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[9]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[11]  Jack Xin,et al.  A Weighted Difference of Anisotropic and Isotropic Total Variation Model for Image Processing , 2015, SIAM J. Imaging Sci..

[12]  Jack Xin,et al.  Minimization of ℓ1-2 for Compressed Sensing , 2015, SIAM J. Sci. Comput..

[13]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[14]  Jack Xin,et al.  Computing Sparse Representation in a Highly Coherent Dictionary Based on Difference of $$L_1$$L1 and $$L_2$$L2 , 2015, J. Sci. Comput..

[15]  T. Minka A comparison of numerical optimizers for logistic regression , 2004 .

[16]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[17]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[18]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..