Self-concordant analysis for logistic regression

Most of the non-asymptotic theoretical work in regression is carried out for the square loss, where estimators can be obtained through closed-form expressions. In this paper, we use and extend tools from the convex optimization literature, namely self-concordant functions, to provide simple extensions of theoretical results for the square loss to the logistic loss. We apply the extension techniques to logistic regression with regularization by the $\ell_2$-norm and regularization by the $\ell_1$-norm, showing that new results for binary classification through logistic regression can be easily derived from corresponding results for least-squares regression.

[1]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[2]  C. L. Mallows Some comments on C_p , 1973 .

[3]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[4]  B. Yandell,et al.  Automatic Smoothing of Regression Functions in Generalized Linear Models , 1986 .

[5]  H. Bozdogan Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions , 1987 .

[6]  Ker-Chau Li,et al.  Asymptotic Optimality for $C_p, C_L$, Cross-Validation and Generalized Cross-Validation: Discrete Index Set , 1987 .

[7]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[8]  R. Shibata Statistical aspects of model selection , 1989 .

[9]  G. Wahba Spline models for observational data , 1990 .

[10]  Chong Gu Adaptive Spline Smoothing in Non-Gaussian Regression Models , 1990 .

[11]  P. Mykland,et al.  Nonlinear Experiments: Optimal Design and Inference Based on Likelihood , 1993 .

[12]  Yurii Nesterov,et al.  Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[13]  C. Mallows More comments on C p , 1995 .

[14]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[15]  Ronald Christensen,et al.  Log-Linear Models and Logistic Regression , 1997 .

[16]  Adrian S. Lewis,et al.  Convex Analysis And Nonlinear Optimization , 2000 .

[17]  Colin L. Mallows,et al.  Some Comments on Cp , 2000, Technometrics.

[18]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[19]  Chong Gu Smoothing Spline Anova Models , 2002 .

[20]  P. Reynaud-Bouret,et al.  Exponential Inequalities, with Constants, for U-statistics of Order Two , 2003 .

[21]  Jean Charles Gilbert,et al.  Numerical Optimization: Theoretical and Practical Aspects , 2003 .

[22]  Philip D. Plowright,et al.  Convexity , 2019, Optimization for Chemical and Biochemical Engineering.

[23]  B. Efron The Estimation of Prediction Error , 2004 .

[24]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[25]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[26]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[27]  Ingo Steinwart,et al.  A new concentration result for regularized risk minimizers , 2006, math/0612779.

[28]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[29]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[30]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[31]  M. Yuan,et al.  On the non‐negative garrotte estimator , 2007 .

[32]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Maximum Likelihood Estimation , 2007, ArXiv.

[33]  Zaïd Harchaoui,et al.  Testing for Homogeneity with Kernel Fisher Discriminant Analysis , 2007, NIPS.

[34]  S. Geer HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.

[35]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[36]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[37]  Elad Hazan,et al.  Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.

[38]  Nathan Srebro,et al.  Fast Rates for Regularized Objectives , 2008, NIPS.

[39]  Francis R. Bach,et al.  Bolasso: model consistent Lasso estimation through the bootstrap , 2008, ICML '08.

[40]  F. Bunea Honest variable selection in linear and logistic regression models via $\ell_1$ and $\ell_1+\ell_2$ penalization , 2008, 0808.4051.

[41]  Francis R. Bach,et al.  Data-driven calibration of linear estimators with minimal penalties , 2009, NIPS.

[42]  Tong Zhang Some sharp performance bounds for least squares regression with L1 regularization , 2009, 0908.2869.

[43]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[44]  Arkadi Nemirovski,et al.  On verifiable sufficient conditions for sparse signal recovery via ℓ1 minimization , 2008, Math. Program..

[45]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[46]  Alexandre d'Aspremont,et al.  Testing the nullspace property using semidefinite programming , 2008, Math. Program..

[47]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .