Ecologists overestimate the importance of predictor variables in model averaging: a plea for cautious interpretations

Summary 1. Information-theory procedures are powerful tools for multimodel inference and are now standard methods in ecology. When performing model averaging on a given set of models, the importance of a predictor variable is commonly estimated by summing the weights of models where the variable appears, the so-called sum of weights (SW). However, SWs have received little methodological attention and are frequently misinterpreted. 2. We assessed the reliability of SW by performing model selection and averaging on simulated data sets including variables strongly and weakly correlated to the response variable and a variable unrelated to the response. Our aim was to investigate how useful SWs are to inform about the relative importance of predictor variables. 3. SW can take a wide range of possible values, even for predictor variables unrelated to the response. As a consequence, SW with intermediate values cannot be confidently interpreted as denoting importance for the considered predictor variable. Increasing sample size using an alternative information criterion for model selection or using only a subset of candidate models for model averaging did not qualitatively change our results: a variable of a given effect size can take a wide range of SW values. 4. Contrary to what is assumed in many ecological studies, it seems hazardous to define a threshold for SW above which a variable is considered as having a statistical effect on the response and SW is not a measure of effect size. Although we did not consider every possible condition of analysis, it is likely that in most situations, SW is a poor estimate of variable’s importance.

[1]  Kimberly A. Pollard,et al.  Changing philosophies and tools for statistical inferences in behavioral ecology , 2009 .

[2]  I. Cuthill,et al.  Effect size, confidence interval and statistical significance: a practical guide for biologists , 2007, Biological reviews of the Cambridge Philosophical Society.

[3]  Shane A. Richards,et al.  Dealing with overdispersed count data in applied ecology , 2007 .

[4]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[5]  David R. Anderson,et al.  AIC model selection and multimodel inference in behavioral ecology: some background, observations, and comparisons , 2011, Behavioral Ecology and Sociobiology.

[6]  E. George The Variable Selection Problem , 2000 .

[7]  Benjamin M. Bolker,et al.  Ecological Models and Data in R , 2008 .

[8]  Vincent Calcagno,et al.  glmulti: An R Package for Easy Automated Model Selection with (Generalized) Linear Models , 2010 .

[9]  N. Dochtermann,et al.  Developing multiple hypotheses in behavioral ecology , 2010, Behavioral Ecology and Sociobiology.

[10]  Mollie E. Brooks,et al.  Generalized linear mixed models: a practical guide for ecology and evolution. , 2009, Trends in ecology & evolution.

[11]  M. Symonds,et al.  A brief guide to model selection, multimodel inference and model averaging in behavioural ecology using Akaike’s information criterion , 2010, Behavioral Ecology and Sociobiology.

[12]  Philip A. Stephens,et al.  Model selection and model averaging in behavioural ecology: the utility of the IT-AIC framework , 2010, Behavioral Ecology and Sociobiology.

[13]  Shane A. Richards,et al.  TESTING ECOLOGICAL THEORY USING THE INFORMATION‐THEORETIC APPROACH: EXAMPLES AND CAUTIONARY RESULTS , 2005 .

[14]  K. Burnham,et al.  Model selection: An integral part of inference , 1997 .

[15]  David R. Anderson,et al.  Multimodel Inference , 2004 .

[16]  David Fletcher,et al.  Model-averaged confidence intervals for factorial experiments , 2011, Comput. Stat. Data Anal..

[17]  Philip A. Stephens,et al.  Inference in ecology and evolution. , 2007, Trends in ecology & evolution.

[18]  Roger Mundry,et al.  Issues in information theory-based statistical inference—a commentary from a frequentist’s perspective , 2010, Behavioral Ecology and Sociobiology.

[19]  R. O’Brien,et al.  A Caution Regarding Rules of Thumb for Variance Inflation Factors , 2007 .

[20]  A. John Bailer,et al.  Comparing model averaging with other model selection strategies for benchmark dose estimation , 2009, Environmental and Ecological Statistics.

[21]  Jerald B. Johnson,et al.  Model selection in ecology and evolution. , 2004, Trends in ecology & evolution.

[22]  David R. Anderson,et al.  Model selection bias and Freedman’s paradox , 2010 .

[23]  A. Genz,et al.  Computation of Multivariate Normal and t Probabilities , 2009 .

[24]  I. Jamieson,et al.  Multimodel inference in ecology and evolution: challenges and solutions , 2011, Journal of evolutionary biology.

[25]  L. Garamszegi,et al.  Information-theoretic approaches to statistical analysis in behavioural ecology: an introduction , 2010, Behavioral Ecology and Sociobiology.

[26]  William A Link,et al.  Model weights and the foundations of multimodel inference. , 2006, Ecology.

[27]  David Fletcher,et al.  Model-averaged Wald confidence intervals , 2012, Comput. Stat. Data Anal..

[28]  Ken Aho,et al.  Model selection for ecologists: the worldviews of AIC and BIC. , 2014, Ecology.

[29]  Robert P. Freckleton,et al.  Dealing with collinearity in behavioural and ecological data: model averaging and the problems of measurement error , 2010, Behavioral Ecology and Sociobiology.