Gradient Directed Regularization for Linear Regression and Classi…cation

Regularization in linear modeling is viewed as a two–stage process. First a set of candidate models is de…ned by a path through the space of joint parameter values, and then a point on this path is chosen to be the …nal model. Various path…nding strategies for the …rst stage of this process are examined, based on the notion of generalized gradient descent. Several of these strategies are seen to produce paths that closely correspond to those induced by commonly used penalization methods. Others give rise to new regularization techniques that are shown to be advantageous in some situations. In all cases, the gradient descent path…nding paradigm can be readily generalized to include the use of a wide variety of loss criteria, leading to robust methods for regression and classi…cation, as well as to apply user de…ned constraints on the parameter values, all with highly e¢ cient computational implementations.

[1]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[2]  Philip E. Gill,et al.  Practical optimization , 1981 .

[3]  S. Wold,et al.  The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses , 1984 .

[4]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[5]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[6]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[7]  I. Johnstone,et al.  Wavelet Shrinkage: Asymptopia? , 1995 .

[8]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[9]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[10]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[11]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[12]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[14]  E. Petricoin,et al.  Serum proteomic patterns for detection of prostate cancer. , 2002, Journal of the National Cancer Institute.

[15]  H. Zou,et al.  Regression Shrinkage and Selection via the Elastic Net , with Applications to Microarrays , 2003 .

[16]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[17]  Bogdan E. Popescu,et al.  Importance Sampled Learning Ensembles , 2003 .

[18]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[19]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[20]  B. Turlach Discussion of "Least Angle Regression" by Efron, Hastie, Johnstone and Tibshirani , 2004 .

[21]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .