Another look at linear programming for feature selection via methods of regularization

We consider statistical procedures for feature selection defined by a family of regularization problems with convex piecewise linear loss functions and penalties of l1 nature. Many known statistical procedures (e.g. quantile regression and support vector machines with l1-norm penalty) are subsumed under this category. Computationally, the regularization problems are linear programming (LP) problems indexed by a single parameter, which are known as ‘parametric cost LP’ or ‘parametric right-hand-side LP’ in the optimization theory. Exploiting the connection with the LP theory, we lay out general algorithms, namely, the simplex algorithm and its variant for generating regularized solution paths for the feature selection problems. The significance of such algorithms is that they allow a complete exploration of the model space along the paths and provide a broad view of persistent features in the data. The implications of the general path-finding algorithms are outlined for several statistical procedures, and they are illustrated with numerical examples.

[1]  Yuval Rabani,et al.  Linear Programming , 2007, Handbook of Approximation Algorithms and Metaheuristics.

[2]  P. Wolfe A Technique for Resolving Degeneracy in Linear Programming , 1963 .

[3]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[4]  Xiaotong Shen,et al.  MULTI-CATEGORY SUPPORT VECTOR MACHINES, FEATURE SELECTION AND SOLUTION PATH , 2006 .

[5]  Jude W. Shavlik,et al.  Machine Learning: Proceedings of the Fifteenth International Conference , 1998 .

[6]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[7]  Yoonkyung Lee,et al.  Structured multicategory support vector machines with analysis of variance decomposition , 2006 .

[8]  R. Koenker Quantile Regression: Name Index , 2005 .

[9]  Kengo Kato,et al.  Solving ℓ1 Regularization Problems With Piecewise Linear Losses , 2010 .

[10]  H. M. Wagner Linear Programming Techniques for Regression Analysis , 1959 .

[11]  R. Koenker,et al.  Regression Quantiles , 2007 .

[12]  T. L. Saaty,et al.  The computational algorithm for the parametric objective function , 1955 .

[13]  M. R. Osborne,et al.  A new approach to variable selection in least squares problems , 2000 .

[14]  Stephen J. Wright Primal-Dual Interior-Point Methods , 1997, Other Titles in Applied Mathematics.

[15]  Dafydd Gibbon,et al.  1 User’s guide , 1998 .

[16]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[17]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[18]  W. Steiger,et al.  Least Absolute Deviations: Theory, Applications and Algorithms , 1984 .

[19]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[20]  Yoonkyung Lee,et al.  CHARACTERIZING THE SOLUTION PATH OF MULTICATEGORY SUPPORT VECTOR MACHINES , 2006 .

[21]  J. G. Evans,et al.  Postoptimal Analyses, Parametric Programming, and Related Topics , 1979 .

[22]  Hao Helen Zhang,et al.  Component selection and smoothing in multivariate nonparametric regression , 2006, math/0702659.

[23]  R. Koenker,et al.  Robust Tests for Heteroscedasticity Based on Regression Quantiles , 1982 .

[24]  R. Koenker,et al.  Computing regression quantiles , 1987 .

[25]  Hao Helen Zhang Variable selection for support vector machines via smoothing spline anova , 2006 .

[26]  A. E. Hoerl,et al.  Ridge Regression: Applications to Nonorthogonal Problems , 1970 .

[27]  Jacek Gondzio,et al.  A New Unblocking Technique to Warmstart Interior Point Methods Based on Sensitivity Analysis , 2008, SIAM J. Optim..

[28]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[29]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[30]  Sanjay Mehrotra,et al.  On the Implementation of a Primal-Dual Interior Point Method , 1992, SIAM J. Optim..

[31]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[32]  I. Barrodale,et al.  An Improved Algorithm for Discrete $l_1 $ Linear Approximation , 1973 .

[33]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[34]  Thomas L. Saaty,et al.  Parametric Objective Function (Part 1) , 1954, Oper. Res..

[35]  Margaret H. Wright,et al.  Interior methods for constrained optimization , 1992, Acta Numerica.

[36]  Philip E. Gill,et al.  Numerical Linear Algebra and Optimization , 1991 .

[37]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[38]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[39]  David F. Shanno,et al.  An exact primal–dual penalty method approach to warmstarting interior-point methods for linear programming , 2007, Comput. Optim. Appl..

[40]  G. Wahba Spline models for observational data , 1990 .

[41]  M. R. Osborne,et al.  A Homotopy Algorithm for the Quantile Regression Lasso and Related Piecewise Linear Problems , 2011 .

[42]  S. Rosset,et al.  Piecewise linear regularized solution paths , 2007, 0708.2197.

[43]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale $\ell_1$-Regularized Least Squares , 2007, IEEE Journal of Selected Topics in Signal Processing.

[44]  H. Bondell,et al.  Simultaneous Regression Shrinkage, Variable Selection, and Supervised Clustering of Predictors with OSCAR , 2008, Biometrics.

[45]  Walter D. Fisher A Note on Curve Fitting with Minimum Deviations by Linear Programming , 1961 .

[46]  Stephen J. Wright,et al.  Simultaneous Variable Selection , 2005, Technometrics.

[47]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[48]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[49]  Q. Zhang A NEW POLYNOMIAL-TIME ALGORITHM FOR LP , 1996 .

[50]  Ji Zhu,et al.  Variable selection for multicategory SVM via sup-norm regularization , 2006 .

[51]  R. Koenker,et al.  A Remark on Algorithm as 229: Computing Dual Regression Quantiles and Regression Rank Scores , 1994 .

[52]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[53]  Mee Young Park,et al.  L1‐regularization path algorithm for generalized linear models , 2007 .

[54]  Charles A. Micchelli,et al.  Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[55]  J. Brezmes,et al.  Variable selection for support vector machine based multisensor systems , 2007 .

[56]  G. Cho A NEW PRIMAL-DUAL INTERIOR POINT METHOD FOR LINEAR OPTIMIZATION , 2009 .

[57]  Steve R. Gunn,et al.  Structural Modelling with Sparse Kernels , 2002, Machine Learning.

[58]  Robert J. Vanderbei,et al.  Linear Programming: Foundations and Extensions , 1998, Kluwer international series in operations research and management service.

[59]  H. Zou,et al.  The F ∞ -norm support vector machine , 2008 .

[60]  Ji Zhu,et al.  L1-Norm Quantile Regression , 2008 .

[61]  Yonggang Yao,et al.  Statistical Applications of Linear Programming for Feature Selection via Regularization Methods , 2008 .

[62]  George B. Dantzig,et al.  Linear programming and extensions , 1965 .

[63]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[64]  Thomas L. Saaty,et al.  Parametric Objective Function (Part 2) - Generalization , 1955, Oper. Res..

[65]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[66]  E. Polak Introduction to linear and nonlinear programming , 1973 .

[67]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[68]  W. Steiger,et al.  Least Absolute Deviations Curve-Fitting , 1980 .

[69]  B. Turlach Discussion of "Least Angle Regression" by Efron, Hastie, Johnstone and Tibshirani , 2004 .