Why do we still use stepwise modelling in ecology and behaviour?

1. The biases and shortcomings of stepwise multiple regression are well established within the statistical literature. However, an examination of papers published in 2004 by three leading ecological and behavioural journals suggested that the use of this technique remains widespread: of 65 papers in which a multiple regression approach was used, 57% of studies used a stepwise procedure. 2. The principal drawbacks of stepwise multiple regression include bias in parameter estimation, inconsistencies among model selection algorithms, an inherent (but often overlooked) problem of multiple hypothesis testing, and an inappropriate focus or reliance on a single best model. We discuss each of these issues with examples. 3. We use a worked example of data on yellowhammer distribution collected over 4 years to highlight the pitfalls of stepwise regression. We show that stepwise regression allows models containing significant predictors to be obtained from each year's data. In spite of the significance of the selected models, they vary substantially between years and suggest patterns that are at odds with those determined by analysing the full, 4-year data set. 4. An information theoretic (IT) analysis of the yellowhammer data set illustrates why the varying outcomes of stepwise analyses arise. In particular, the IT approach identifies large numbers of competing models that could describe the data equally well, showing that no one model should be relied upon for inference.

[1]  Richard B. Bradbury,et al.  Habitat associations and breeding success of yellowhammers on lowland farmland , 2000 .

[2]  H. Keselman,et al.  Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables , 1992 .

[3]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[4]  Brendan A. Wintle,et al.  The Use of Bayesian Model Averaging to Better Represent Uncertainty in Ecological Models , 2003 .

[5]  Lev R Ginzburg,et al.  Rules of thumb for judging ecological theories. , 2004, Trends in ecology & evolution.

[6]  Leland Wilkinson Tests of significance in stepwise regression. , 1979 .

[7]  S. Ormerod,et al.  New paradigms for modelling species distributions , 2004 .

[8]  Eric Post,et al.  LARGE‐SCALE SPATIAL GRADIENTS IN HERBIVORE POPULATION DYNAMICS , 2005 .

[9]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[10]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[11]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[12]  H. Akaike A new look at the statistical model identification , 1974 .

[13]  FRED S. GUTHERY,et al.  INVITED PAPER: INFORMATION THEORY IN WILDLIFE SCIENCE: CRITIQUE AND VIEWPOINT , 2005 .

[14]  J. T. Webster,et al.  The Use of an F-Statistic in Stepwise Regression Procedures , 1972 .

[15]  David R. Anderson,et al.  Null Hypothesis Testing: Problems, Prevalence, and an Alternative , 2000 .

[16]  David R. Anderson,et al.  Model Selection and Multimodel Inference , 2003 .

[17]  Stelios Kafandaris,et al.  Problem Solving: A Statistician's Guide , 1996 .

[18]  Philip A. Stephens,et al.  Information theory and hypothesis testing: a call for pluralism , 2005 .

[19]  S. Goldhor Ecology , 1964, The Yale Journal of Biology and Medicine.

[20]  Douglas H. Johnson The Insignificance of Statistical Significance Testing , 1999 .

[21]  Norman R. Draper,et al.  Applied regression analysis (2. ed.) , 1981, Wiley series in probability and mathematical statistics.

[22]  E W Steyerberg,et al.  Stepwise selection in small data sets: a simulation study of bias in logistic regression analysis. , 1999, Journal of clinical epidemiology.

[23]  Mark J. Whittingham,et al.  Habitat selection by yellowhammers Emberiza citrinella on lowland farmland at two spatial scales: implications for conservation management , 2005 .

[24]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[25]  Jerald B. Johnson,et al.  Model selection in ecology and evolution. , 2004, Trends in ecology & evolution.

[26]  David R. Anderson,et al.  Model Selection and Inference: A Practical Information-Theoretic Approach , 2001 .

[27]  Mark S. Boyce,et al.  A quantitative approach to conservation planning: using resource selection functions to map the distribution of mountain caribou at multiple spatial scales , 2004 .

[28]  Jacob Cohen The earth is round (p < .05) , 1994 .

[29]  Clifford M. Hurvich,et al.  The impact of model selection on inference in linear regression , 1990 .

[30]  R. P. Carver The Case Against Statistical Significance Testing , 1978 .

[31]  P. McCullagh,et al.  Generalized Linear Models, 2nd Edn. , 1990 .