A protocol for conducting and presenting results of regression‐type analyses

Summary Scientific investigation is of value only insofar as relevant results are obtained and communicated, a task that requires organizing, evaluating, analysing and unambiguously communicating the significance of data. In this context, working with ecological data, reflecting the complexities and interactions of the natural world, can be a challenge. Recent innovations for statistical analysis of multifaceted interrelated data make obtaining more accurate and meaningful results possible, but key decisions of the analyses to use, and which components to present in a scientific paper or report, may be overwhelming. We offer a 10-step protocol to streamline analysis of data that will enhance understanding of the data, the statistical models and the results, and optimize communication with the reader with respect to both the procedure and the outcomes. The protocol takes the investigator from study design and organization of data (formulating relevant questions, visualizing data collection, data exploration, identifying dependency), through conducting analysis (presenting, fitting and validating the model) and presenting output (numerically and visually), to extending the model via simulation. Each step includes procedures to clarify aspects of the data that affect statistical analysis, as well as guidelines for written presentation. Steps are illustrated with examples using data from the literature. Following this protocol will reduce the organization, analysis and presentation of what may be an overwhelming information avalanche into sequential and, more to the point, manageable, steps. It provides guidelines for selecting optimal statistical tools to assess data relevance and significance, for choosing aspects of the analysis to include in a published report and for clearly communicating information.

[1]  D. Curran‐Everett,et al.  The fickle P value generates irreproducible results , 2015, Nature Methods.

[2]  Y. Ohashi,et al.  A guideline for reporting results of statistical analysis in Japanese Journal of Clinical Oncology. , 1997, Japanese journal of clinical oncology.

[3]  Alexandre Roulin,et al.  Nestling barn owls beg more intensely in the presence of their mother than in the presence of their father , 2007, Animal Behaviour.

[4]  Sunil J Rao,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2003 .

[5]  Edward R. Tufte,et al.  The Visual Display of Quantitative Information , 1986 .

[6]  Benjamin M. Bolker,et al.  Ecological Models and Data in R , 2008 .

[7]  Sarah A. Butcher,et al.  Hydrophobin Film Structure for HFBI and HFBII and Mechanism for Accelerated Film Formation , 2014, PLoS Comput. Biol..

[8]  David J. Lunn,et al.  The BUGS Book: A Practical Introduction to Bayesian Analysis , 2013 .

[9]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[10]  J. Hilbe Negative Binomial Regression: Preface , 2007 .

[11]  Guy Cowlishaw,et al.  Evidence for varying social strategies across the day in chacma baboons , 2014, Biology Letters.

[12]  M. McCarthy Bayesian Methods for Ecology: Frontmatter , 2007 .

[13]  Michael J. Crawley,et al.  The R book , 2022 .

[14]  Philip E. Bourne,et al.  Ten Simple Rules for Better Figures , 2014, PLoS Comput. Biol..

[15]  Alain F. Zuur,et al.  Zero inflated models and generalized linear mixed models with R , 2012 .

[16]  Fränzi Korner-Nievergelt,et al.  Bayesian Data Analysis in Ecology Using Linear Models with R, BUGS, and Stan : Including Comparisons to Frequentist Statistics , 2015 .

[17]  Andrew B. Lawson,et al.  Bayesian Disease Mapping: Hierarchical Modeling in Spatial Epidemiology , 2008 .

[18]  Alain F. Zuur,et al.  A protocol for data exploration to avoid common statistical problems , 2010 .

[19]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[20]  D. Bates,et al.  Mixed-Effects Models in S and S-PLUS , 2001 .

[21]  Alain F. Zuur,et al.  A beginner's guide to generalized additive models with R , 2012 .

[22]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[23]  S. Hurlbert Pseudoreplication and the Design of Ecological Field Experiments , 1984 .

[24]  D. Bates,et al.  Linear Mixed-Effects Models using 'Eigen' and S4 , 2015 .

[25]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[26]  G. Quinn,et al.  Experimental Design and Data Analysis for Biologists , 2002 .

[27]  A. Zuur,et al.  A Beginner’s Guide to GLM and GLMM with R: A Frequentist and Bayesian Perspective for Ecologists , 2013 .

[28]  Deepayan Sarkar,et al.  Lattice: Multivariate Data Visualization with R , 2008 .

[29]  Marc Kery,et al.  Introduction to WinBUGS for Ecologists: Bayesian approach to regression, ANOVA, mixed models and related analyses , 2010 .

[30]  Walter Krämer,et al.  Review of Modern applied statistics with S, 4th ed. by W.N. Venables and B.D. Ripley. Springer-Verlag 2002 , 2003 .

[31]  F. J. Pierce,et al.  Contemporary Statistical Models for the Plant and Soil Sciences , 2001 .

[32]  Graham J Hole,et al.  How to Design and Report Experiments , 2002 .

[33]  Björn Gustavii,et al.  How to Write and Illustrate a Scientific Paper 2nd Edition , 2003 .

[34]  Martyn Plummer,et al.  JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling , 2003 .

[35]  Andrew Gelman,et al.  Let's Practice What We Preach , 2002 .

[36]  Peter Dalgaard,et al.  Introductory statistics with R , 2002, Statistics and computing.

[37]  Shinichi Nakagawa,et al.  A general and simple method for obtaining R2 from generalized linear mixed‐effects models , 2013 .

[38]  S. Wood Generalized Additive Models: An Introduction with R , 2006 .

[39]  Michael J. Crawley The R Book: Crawley/The R Book , 2012 .