Spending degrees of freedom in a poor economy: A case study of building a sightability model for moose in northeastern Minnesota

Sightability models are binary logistic-regression models used to estimate and adjust for visibility bias in wildlife-population surveys. Like many models in wildlife and ecology, sightability models are typically developed from small observational datasets with many candidate predictors. Aggressive model-selection methods are often employed to choose a best model for prediction and effect estimation, despite evidence that such methods can lead to overfitting (i.e., selected models may describe random error or noise rather than true predictor–response curves) and poor predictive ability. We used moose (Alces alces) sightability data from northeastern Minnesota (2005–2007) as a case study to illustrate an alternative approach, which we refer to as degrees-of-freedom (df) spending: sample-size guidelines are used to determine an acceptable level of model complexity and then a pre-specified model is fit to the data and used for inference. For comparison, we also constructed sightability models using Akaike's Information Criterion (AIC) step-down procedures and model averaging (based on a small set of models developed using df-spending guidelines). We used bootstrap procedures to mimic the process of model fitting and prediction, and to compute an index of overfitting, expected predictive accuracy, and model-selection uncertainty. The index of overfitting increased 13% when the number of candidate predictors was increased from three to eight and a best model was selected using step-down procedures. Likewise, model-selection uncertainty increased when the number of candidate predictors increased. Model averaging (based on R = 30 models with 1–3 predictors) effectively shrunk regression coefficients toward zero and produced similar estimates of precision to our 3-df pre-specified model. As such, model averaging may help to guard against overfitting when too many predictors are considered (relative to available sample size). The set of candidate models will influence the extent to which coefficients are shrunk toward zero, which has implications for how one might apply model averaging to problems traditionally approached using variable-selection methods. We often recommend the df-spending approach in our consulting work because it is easy to implement and it naturally forces investigators to think carefully about their models and predictors. Nonetheless, similar concepts should apply whether one is fitting 1 model or using multi-model inference. For example, model-building decisions should consider the effective sample size, and potential predictors should be screened (without looking at their relationship to the response) for missing data, narrow distributions, collinearity, potentially overly influential observations, and measurement errors (e.g., via logical error checks). © 2011 The Wildlife Society.

[1]  M. Lenarz,et al.  Temperature Mediated Moose Survival in Northeastern Minnesota , 2009 .

[2]  W. Rice,et al.  APPLICATION OF MULTIPLE AERIAL SAMPLING TO A MARK-RECAPTURE CENSUS OF WHITE-TAILED DEER , 1977 .

[3]  E. O. Garton,et al.  Estimation of wildlife population ratios incorporating survey design and visibility bias , 1992 .

[4]  Variance of Stratified Survey Estimators With Probability of Detection Adjustments , 2008 .

[5]  J. C. van Houwelingen,et al.  Predictive value of statistical models , 1990 .

[6]  Michael A Babyak,et al.  What You See May Not Be What You Get: A Brief, Nontechnical Introduction to Overfitting in Regression-Type Models , 2004, Psychosomatic medicine.

[7]  John H. Giudice,et al.  Cost and Precision Functions for Aerial Quadrat Surveys: a Case Study of Ring-Necked Ducks in Minnesota , 2010 .

[8]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[9]  Gary C. White,et al.  Aerial Mark-Recapture Estimates of Confined Mule Deer in Pinyon-Juniper Woodland , 1987 .

[10]  B. Johnson,et al.  Assessing aerial survey methods to estimate elk populations: a case study. , 2000 .

[11]  J. Copas,et al.  Estimating the Residual Variance in Orthogonal Regression with Variable Selection , 1991 .

[12]  J. C. van Houwelingen,et al.  Shrinkage and Penalized Likelihood as Methods to Improve Predictive Accuracy , 2001 .

[13]  J. Dahlgren,et al.  Alternative regression methods are not considered in Murtaugh (2009) or by ecologists in general. , 2010, Ecology letters.

[14]  Paul A Murtaugh,et al.  Performance of several variable-selection methods applied to real ecological data. , 2009, Ecology letters.

[15]  Robert P Freckleton,et al.  Why do we still use stepwise modelling in ecology and behaviour? , 2006, The Journal of animal ecology.

[16]  H. Campa,et al.  Incorporating Estimates of Group Size in Sightability Models for Wildlife , 2009 .

[17]  Kenneth H. Pollock,et al.  Visibility bias in aerial surveys a review of estimation procedures , 1987 .

[18]  K. Pollock,et al.  Correction of Visibility Bias in Aerial Surveys Where Animals Occur in Groups , 1981 .

[19]  J. Fieberg,et al.  Cost-Effectiveness of Single- Versus Double-Cylinder Over-Water Nest Structures , 2006 .

[20]  C. Anderson,et al.  DEVELOPMENT AND EVALUATION OF SIGHTABILITY MODELS FOR SUMMER ELK SURVEYS , 1998 .

[21]  R. O’Hara,et al.  A review of Bayesian variable selection methods: what, how and which , 2009 .

[22]  K. Jenkins,et al.  A Sightability Model for Mountain Goats , 2009 .

[23]  S. Mccorquodale,et al.  Sex-specific bias in helicopter surveys of elk: Sightability and dispersion effects , 2001 .

[24]  J. Habbema,et al.  Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets. , 2000, Statistics in medicine.

[25]  S. Rosenstock,et al.  Review of big-game survey methods used by wildlife agencies of the western United States , 2002 .

[26]  H. Keselman,et al.  Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables , 1992 .

[27]  G. Seber,et al.  Detectability in conventional and adaptive sampling. , 1994, Biometrics.

[28]  Michael D. Samuel,et al.  Visibility Bias during Aerial Surveys of Elk in Northcentral Idaho , 1987 .

[29]  G. Caughley Bias in Aerial Survey , 1974 .

[30]  Roger Mundry,et al.  Stepwise Model Fitting and Statistical Inference: Turning Noise into Signal Pollution , 2008, The American Naturalist.

[31]  David R. Anderson,et al.  Model selection bias and Freedman’s paradox , 2010 .

[32]  P. Grambsch,et al.  The effects of transformations and preliminary tests for non-linearity in regression. , 1991, Statistics in medicine.

[33]  S. Harbo,et al.  Estimating moose population parameters from aerial surveys , 1986 .

[34]  D. Altman,et al.  Bootstrap investigation of the stability of a Cox regression model. , 1989, Statistics in medicine.

[35]  G W Sun,et al.  Inappropriate use of bivariable analysis to screen risk factors for use in multivariable analysis. , 1996, Journal of clinical epidemiology.

[36]  Chris Chatfield,et al.  Confessions of a pragmatic statistician , 2002 .

[37]  Duane R. Diefenbach,et al.  Effect of undercounting and model selection on a sightability-adjustment estimator for elk , 1998 .

[38]  J. Copas,et al.  Using regression models for prediction: shrinkage and regression to the mean , 1997, Statistical methods in medical research.

[39]  M. D. Samuel,et al.  Sightability adjustment methods for aerial surveys of wildlife populations , 1989 .

[40]  J. Peek,et al.  Dynamics of Moose Aggregations in Alaska, Minnesota, and Montana , 1974 .