Issues in information theory-based statistical inference—a commentary from a frequentist’s perspective

After several decades during which applied statistical inference in research on animal behaviour and behavioural ecology has been heavily dominated by null hypothesis significance testing (NHST), a new approach based on information theoretic (IT) criteria has recently become increasingly popular, and occasionally, it has been considered to be generally superior to conventional NHST. In this commentary, I discuss some limitations the IT-based method may have under certain circumstances. In addition, I reviewed some recent articles published in the fields of animal behaviour and behavioural ecology and point to some common failures, misunderstandings and issues frequently appearing in the practical application of IT-based methods. Based on this, I give some hints about how to avoid common pitfalls in the application of IT-based inference, when to choose one or the other approach and discuss under which circumstances a mixing of the two approaches might be appropriate.

[1]  David R. Anderson,et al.  AIC model selection and multimodel inference in behavioral ecology: some background, observations, and comparisons , 2011, Behavioral Ecology and Sociobiology.

[2]  Douglas H. Johnson THE ROLE OF HYPOTHESIS TESTING IN WILDLIFE SCIENCE , 2002 .

[3]  Kate E Decleene,et al.  Publication Manual of the American Psychological Association , 2011 .

[4]  M. Symonds,et al.  A brief guide to model selection, multimodel inference and model averaging in behavioural ecology using Akaike’s information criterion , 2010, Behavioral Ecology and Sociobiology.

[5]  H. Schielzeth Simple means to improve the interpretability of regression coefficients , 2010 .

[6]  Eric R. Ziegel,et al.  An Introduction to Generalized Linear Models , 2002, Technometrics.

[7]  Ian Witten,et al.  Data Mining , 2000 .

[8]  Jerald B. Johnson,et al.  Model selection in ecology and evolution. , 2004, Trends in ecology & evolution.

[9]  R. Sokal,et al.  Biometry: The Principles and Practice of Statistics in Biological Research (2nd ed.). , 1982 .

[10]  E. Steyerberg,et al.  [Regression modeling strategies]. , 2011, Revista espanola de cardiologia.

[11]  David R. Anderson,et al.  Model Selection and Multimodel Inference , 2003 .

[12]  D.J.H. SLEEP,et al.  Statistical Versus Biological Hypothesis Testing: Response to Steidl , 2007 .

[13]  H. Keselman,et al.  Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables , 1992 .

[14]  Karl Rihaczek,et al.  1. WHAT IS DATA MINING? , 2019, Data Mining for the Social Sciences.

[15]  M. Jennions,et al.  How much variance can be explained by ecologists and evolutionary biologists? , 2002, Oecologia.

[16]  F. James Rohlf,et al.  Biometry: The Principles and Practice of Statistics in Biological Research , 1969 .

[17]  G. Hegyi,et al.  Using information theory as a substitute for stepwise regression in ecology and behavior , 2010, Behavioral Ecology and Sociobiology.

[18]  Hirotugu Akaike,et al.  Analysis of cross classified data by AIC , 1978 .

[19]  Robert P Freckleton,et al.  Why do we still use stepwise modelling in ecology and behaviour? , 2006, The Journal of animal ecology.

[20]  V. Vieland,et al.  Statistical Evidence: A Likelihood Paradigm , 1998 .

[21]  R. Nickerson,et al.  Null hypothesis significance testing: a review of an old and continuing controversy. , 2000, Psychological methods.

[22]  John A. Nelder,et al.  Generalized Linear Models , 1972, Predictive Analytics.

[23]  Philip A. Stephens,et al.  Model selection and model averaging in behavioural ecology: the utility of the IT-AIC framework , 2010, Behavioral Ecology and Sociobiology.

[24]  Sokal Rr,et al.  Biometry: the principles and practice of statistics in biological research 2nd edition. , 1981 .

[25]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[26]  FRED S. GUTHERY,et al.  INVITED PAPER: INFORMATION THEORY IN WILDLIFE SCIENCE: CRITIQUE AND VIEWPOINT , 2005 .

[27]  Andy P. Field,et al.  Discovering Statistics Using SPSS , 2000 .

[28]  B. Tabachnick,et al.  Using multivariate statistics, 5th ed. , 2007 .

[29]  Charles E. McCulloch,et al.  MULTIVARIATE ANALYSIS IN ECOLOGY AND SYSTEMATICS: PANACEA OR PANDORA'S BOX? , 1990 .

[30]  G. Quinn,et al.  Experimental Design and Data Analysis for Biologists , 2002 .

[31]  Celia M. Lombardi,et al.  Final Collapse of the Neyman-Pearson Decision Theoretic Framework and Rise of the neoFisherian , 2009 .

[32]  Roger Mundry,et al.  Stepwise Model Fitting and Statistical Inference: Turning Noise into Signal Pollution , 2008, The American Naturalist.

[33]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[34]  B. Tabachnick,et al.  Using Multivariate Statistics , 1983 .

[35]  David R. Anderson,et al.  Null Hypothesis Testing: Problems, Prevalence, and an Alternative , 2000 .

[36]  Peter C Austin,et al.  Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality. , 2004, Journal of clinical epidemiology.

[37]  I. Cuthill,et al.  Effect size, confidence interval and statistical significance: a practical guide for biologists , 2007, Biological reviews of the Cambridge Philosophical Society.

[38]  S. Ebrahim,et al.  Data dredging, bias, or confounding , 2002, BMJ : British Medical Journal.

[39]  Kimberly A. Pollard,et al.  Changing philosophies and tools for statistical inferences in behavioral ecology , 2009 .

[40]  Philip A. Stephens,et al.  Information theory and hypothesis testing: a call for pluralism , 2005 .

[41]  N. Dochtermann,et al.  Developing multiple hypotheses in behavioral ecology , 2010, Behavioral Ecology and Sociobiology.

[42]  Roger Mundry,et al.  Experimental Designs and Data Analysis for Biologists , 2003 .

[43]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[44]  Albert Madansky,et al.  Analysis of Cross-Classified Data , 1988 .

[45]  A. M. Stoehr,et al.  Are significance thresholds appropriate for the study of animal behaviour? , 1999, Animal Behaviour.

[46]  Philip A. Stephens,et al.  Inference in ecology and evolution. , 2007, Trends in ecology & evolution.

[47]  Larry E. Toothaker,et al.  Multiple Regression: Testing and Interpreting Interactions , 1991 .

[48]  Jacob Cohen The earth is round (p < .05) , 1994 .

[49]  C. Chatfield Model uncertainty, data mining and statistical inference , 1995 .

[50]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[51]  H. Pashler,et al.  Puzzlingly High Correlations in fMRI Studies of Emotion, Personality, and Social Cognition 1 , 2009, Perspectives on psychological science : a journal of the Association for Psychological Science.

[52]  L. Garamszegi,et al.  Information-theoretic approaches to statistical analysis in behavioural ecology: an introduction , 2010, Behavioral Ecology and Sociobiology.

[53]  S. von Felten,et al.  Analysis of variance with unbalanced data: an update for ecology & evolution. , 2010, The Journal of animal ecology.

[54]  David R. Anderson,et al.  Concerns regarding a call for pluralism of information theory and hypothesis testing , 2007 .

[55]  Heejung Bang,et al.  Cereal-induced gender selection? Most likely a multiple testing false positive , 2009, Proceedings of the Royal Society B: Biological Sciences.

[56]  P. Lachenbruch Statistical Power Analysis for the Behavioral Sciences (2nd ed.) , 1989 .

[57]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[58]  H. Schielzeth,et al.  Cryptic multiple hypotheses testing in linear models: overestimated effect sizes and the winner's curse , 2010, Behavioral Ecology and Sociobiology.

[59]  David R. Anderson,et al.  Suggestions for presenting the results of data analyses , 2001 .

[60]  Douglas H. Johnson The Insignificance of Statistical Significance Testing , 1999 .

[61]  Freda Kemp Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences , 2003 .

[62]  ROBERT J. STEIDL,et al.  Model Selection, Hypothesis Testing, and Risks of Condemning Analytical Tools , 2006 .

[63]  Robert P. Freckleton,et al.  Dealing with collinearity in behavioural and ecological data: model averaging and the problems of measurement error , 2010, Behavioral Ecology and Sociobiology.

[64]  D. Freedman A Note on Screening Regression Equations , 1983 .