Variable selection in additive models by non-negative garrote

We adapt Breiman’s non-negative garrote method to perform variable selection in non-parametric additive models. The technique avoids methods of testing for which no general reliable distributional theory is available. In addition, it removes the need for a full search of all possible models, something which is computationally intensive, especially when the number of variables is moderate to high. The method has the advantages of being conceptually simple and computationally fast. It provides accurate predictions and is effective at identifying the variables generating the model. To illustrate our procedure, we analyse logbook data on blue sharks (Prionace glauca) from the US pelagic longline fishery. In addition, we compare our proposal to a series of available alternatives by simulation. The results show that in all cases our methods perform better or as well as these alternatives.

[1]  Chong Gu Smoothing Spline Anova Models , 2002 .

[2]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[3]  P. Tseng,et al.  AMlet, RAMlet, and GAMlet: Automatic Nonlinear Fitting of Additive Models, Robust and Generalized, With Wavelets , 2004 .

[4]  W. Härdle Applied Nonparametric Regression , 1992 .

[5]  P. Bühlmann,et al.  Boosting With the L2 Loss , 2003 .

[6]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[7]  Christophe Ambroise,et al.  Parsimonious additive models , 2007, Comput. Stat. Data Anal..

[8]  George M. Furnival,et al.  Regressions by leaps and bounds , 2000 .

[9]  S. Wood Stable and Efficient Multiple Smoothing Parameter Estimation for Generalized Additive Models , 2004 .

[10]  Hao Helen Zhang,et al.  Component selection and smoothing in multivariate nonparametric regression , 2006, math/0702659.

[11]  Elvezio Ronchetti,et al.  Variable Selection for Marginal Longitudinal Generalized Linear Models , 2003, Biometrics.

[12]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[13]  Elvezio Ronchetti,et al.  Resistant selection of the smoothing parameter for smoothing splines , 2001, Stat. Comput..

[14]  David Ruppert,et al.  Variable Selection and Function Estimation in Additive Nonparametric Regression Using a Data-Based Prior: Comment , 1999 .

[15]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[16]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[17]  Jaya Krishnakumar,et al.  Spatial Distribution of Welfare Across States and Different Socio-Economic Groups in Rural and Urban India , 2004 .

[18]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[19]  Trevor Hastie,et al.  Degrees‐of‐freedom tests for smoothing splines , 2002 .

[20]  W. W. Muir,et al.  Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1980 .

[21]  Thomas S. Shively,et al.  Variable Selection and Function Estimation in Additive Nonparametric Regression Using a Data-Based Prior , 1999 .

[22]  M. Yuan,et al.  On the non‐negative garrotte estimator , 2007 .

[23]  L. Breiman Better subset regression using the nonnegative garrote , 1995 .

[24]  E Cantoni,et al.  Longitudinal variable selection by cross‐validation in the case of many covariates , 2007, Statistics in medicine.

[25]  G. Tutz,et al.  Generalized Additive Modeling with Implicit Variable Selection by Likelihood‐Based Boosting , 2006, Biometrics.

[26]  Jaya Krishnakumar,et al.  Going Beyond Functionings to Capabilities: An Econometric Model to Explain and Estimate Capabilities , 2007 .

[27]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[28]  D. Rubinfeld,et al.  Hedonic housing prices and the demand for clean air , 1978 .

[29]  Alan Y. Chiang,et al.  Generalized Additive Models: An Introduction With R , 2007, Technometrics.

[30]  E. Ronchetti,et al.  Indirect Robust Estimation of the Short-Term Interest Rate Process , 2005 .

[31]  Ransom A. Myers,et al.  Collapse and Conservation of Shark Populations in the Northwest Atlantic , 2003, Science.

[32]  Charles H. Peterson,et al.  Supporting Online Material for Cascading Effects of the Loss of Apex Predatory Sharks from a Coastal Ocean , 2007 .

[33]  S. Wood Generalized Additive Models: An Introduction with R , 2006 .

[34]  S. Wood,et al.  GAMs with integrated model selection using penalized regression splines and applications to environmental modelling , 2002 .

[35]  Ming Yuan,et al.  Nonnegative Garrote Component Selection in Functional ANOVA models , 2007, AISTATS.

[36]  David Ruppert,et al.  A Fully Automated Bandwidth Selection Method for Fitting Additive Models , 1998 .

[37]  Hao Helen Zhang,et al.  Component selection and smoothing in smoothing spline analysis of variance models -- COSSO , 2003 .

[38]  S. Wood Modelling and smoothing parameter estimation with multiple quadratic penalties , 2000 .

[39]  N. Soguel,et al.  Application of Granger Causality Tests to Revenue and Expenditure of Swiss cantons , 2004 .

[40]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[41]  C. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.