On Extrapolating Past the Range of Observed Data When Making Statistical Predictions in Ecology

Ecologists are increasingly using statistical models to predict animal abundance and occurrence in unsampled locations. The reliability of such predictions depends on a number of factors, including sample size, how far prediction locations are from the observed data, and similarity of predictive covariates in locations where data are gathered to locations where predictions are desired. In this paper, we propose extending Cook’s notion of an independent variable hull (IVH), developed originally for application with linear regression models, to generalized regression models as a way to help assess the potential reliability of predictions in unsampled areas. Predictions occurring inside the generalized independent variable hull (gIVH) can be regarded as interpolations, while predictions occurring outside the gIVH can be regarded as extrapolations worthy of additional investigation or skepticism. We conduct a simulation study to demonstrate the usefulness of this metric for limiting the scope of spatial inference when conducting model-based abundance estimation from survey counts. In this case, limiting inference to the gIVH substantially reduces bias, especially when survey designs are spatially imbalanced. We also demonstrate the utility of the gIVH in diagnosing problematic extrapolations when estimating the relative abundance of ribbon seals in the Bering Sea as a function of predictive covariates. We suggest that ecologists routinely use diagnostics such as the gIVH to help gauge the reliability of predictions from statistical models (such as generalized linear, generalized additive, and spatio-temporal regression models).

[1]  Earl D. McCoy,et al.  The Statistics and Biology of the Species-Area Relationship , 1979, The American Naturalist.

[2]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[3]  A. Ehrenberg,et al.  Predictability and prediction , 1993 .

[4]  J. Thorson,et al.  The importance of spatial models for estimating the strength of density dependence. , 2015, Ecology.

[5]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[6]  Len Thomas,et al.  Spatial models for distance sampling data: recent developments and future directions , 2013 .

[7]  James Rosindell,et al.  Unified neutral theory of biodiversity and biogeography , 2010, Scholarpedia.

[8]  R. Cook Influential Observations in Linear Regression , 1979 .

[9]  Richard F. Gunst,et al.  Applied Regression Analysis , 1999, Technometrics.

[10]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[11]  Giampiero Marra,et al.  Modelling the spatiotemporal distribution of the incidence of resident foreign population , 2012 .

[12]  Michael J. Manton Collecting Spatial Data: Optimal Design of Experiments for Random Fields, Third Edition , 2008 .

[13]  Jay M. Ver Hoef,et al.  Using spatiotemporal statistical models to estimate animal abundance and infer ecological dynamics from survey counts , 2015 .

[14]  J. Elith,et al.  Species Distribution Models: Ecological Explanation and Prediction Across Space and Time , 2009 .

[15]  A. Olsen,et al.  Spatially Balanced Sampling of Natural Resources , 2004 .

[16]  Brett T. McClintock,et al.  Estimating multispecies abundance using automated detection systems: ice‐associated seals in the Bering Sea , 2014 .

[17]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[18]  G. Oehlert A note on the delta method , 1992 .

[19]  R. Dennis Cook,et al.  Cross-Validation of Regression Models , 1984 .

[20]  Pascal Monestiez,et al.  Extrapolating cetacean densities beyond surveyed regions: habitat‐based predictions in the circumtropical belt , 2015 .

[21]  T. Simons,et al.  Spatial autocorrelation and autoregressive models in ecology , 2002 .

[22]  Werner G. Müller,et al.  Collecting Spatial Data: Optimum Design of Experiments for Random Fields , 1998 .

[23]  M. Goulard,et al.  Linear coregionalization model: Tools for estimation and choice of cross-variogram matrix , 1992 .

[24]  J. Hoef Who Invented the Delta Method , 2012 .

[25]  Simon N. Wood,et al.  Space‐time modelling of blue ling for fisheries stock management , 2013 .