Performance of several variable-selection methods applied to real ecological data.

I evaluated the predictive ability of statistical models obtained by applying seven methods of variable selection to 12 ecological and environmental data sets. Cross-validation, involving repeated splits of each data set into training and validation subsets, was used to obtain honest estimates of predictive ability that could be fairly compared among methods. There was surprisingly little difference in predictive ability among five methods based on multiple linear regression. Stepwise methods performed similarly to exhaustive algorithms for subset selection, and the choice of criterion for comparing models (Akaike's information criterion, Schwarz's Bayesian information criterion or F statistics) had little effect on predictive ability. For most of the data sets, two methods based on regression trees yielded models with substantially lower predictive ability. I argue that there is no 'best' method of variable selection and that any of the regression-based approaches discussed here is capable of yielding useful predictive models.

[1]  Chris Chatfield,et al.  Confessions of a pragmatic statistician , 2002 .

[2]  Marc J. Mazerolle,et al.  Improving data analysis in herpetology: Using Akaike's information criterion (AIC) to assess the strength of biological hypotheses , 2006 .

[3]  Michael J. Chamberlain Are We Sacrificing Biology for Statistics , 2008 .

[4]  G. Box Robustness in the Strategy of Scientific Model Building. , 1979 .

[5]  Roger Mundry,et al.  Stepwise Model Fitting and Statistical Inference: Turning Noise into Signal Pollution , 2008, The American Naturalist.

[6]  Glenn Deane,et al.  Model selection procedures in social research: Monte-Carlo simulation results , 2008 .

[7]  S. Ghosh,et al.  Performance of information criteria for spatial models , 2009, Journal of statistical computation and simulation.

[8]  N Thompson Hobbs,et al.  Alternatives to statistical hypothesis testing in ecology: a guide to self teaching. , 2006, Ecological applications : a publication of the Ecological Society of America.

[9]  Paul A Murtaugh,et al.  Simplicity and complexity in ecological data analysis. , 2007, Ecology.

[10]  P. Halpin,et al.  Fine-scale habitat modeling of a top marine predator: do prey data improve predictive capacity? , 2008, Ecological applications : a publication of the Ecological Society of America.

[11]  Jennifer A Hoeting,et al.  Model selection for geostatistical models. , 2006, Ecological applications : a publication of the Ecological Society of America.

[12]  Robert P Freckleton,et al.  Why do we still use stepwise modelling in ecology and behaviour? , 2006, The Journal of animal ecology.

[13]  J. Gerritsen,et al.  Episodic Acidification of Coastal Plain Streams: An Estimation of Risk to Fish , 1996 .

[14]  P. Royston,et al.  Selection of important variables and determination of functional form for continuous predictors in multivariable model building , 2007, Statistics in medicine.

[15]  P. Murtaugh,et al.  METHODS OF VARIABLE SELECTION IN REGRESSION MODELING , 1998 .

[16]  G. De’ath,et al.  CLASSIFICATION AND REGRESSION TREES: A POWERFUL YET SIMPLE TECHNIQUE FOR ECOLOGICAL DATA ANALYSIS , 2000 .

[17]  David R. Anderson,et al.  Concerns regarding a call for pluralism of information theory and hypothesis testing , 2007 .

[18]  Julian D. Olden,et al.  Torturing data for the sake of generality: How valid are our regression models? , 2000 .

[19]  FRED S. GUTHERY,et al.  INVITED PAPER: INFORMATION THEORY IN WILDLIFE SCIENCE: CRITIQUE AND VIEWPOINT , 2005 .

[20]  E. Ward A review and comparison of four commonly used Bayesian and maximum likelihood model selection tools , 2008 .

[21]  Steven T Knick,et al.  Landscape Characteristics of Fragmented Shrubsteppe Habitats and Breeding Passerine Birds. , 1995, Conservation biology : the journal of the Society for Conservation Biology.