Spatial leave‐one‐out cross‐validation for variable selection in the presence of spatial autocorrelation

Aim Processes and variables measured in ecology are almost always spatially autocorrelated, potentially leading to the choice of overly complex models when performing variable selection. One way to solve this problem is to account for residual spatial autocorrelation (RSA) for each subset of variables considered and then use a classical model selection criterion such as the Akaike information criterion (AIC). However, this method can be laborious and it raises other concerns such as which spatial model to use or how to compare different spatial models. To improve the accuracy of variable selection in ecology, this study evaluates an alternative method based on a spatial cross-validation procedure. Such a procedure is usually used for model evaluation but can also provide interesting outcomes for variable selection in the presence of spatial autocorrelation. Innovation We propose to use a special case of spatial cross-validation, spatial leave-one-out (SLOO), giving a criterion equivalent to the AIC in the absence of spatial autocorrelation. SLOO only computes non-spatial models and uses a threshold distance (equal to the range of RSA) to keep each point left out spatially independent from the others. We first provide some simulations to evaluate how SLOO performs compared with AIC. We then assess the robustness of SLOO on a large-scale dataset. R software codes are provided for generalized linear models. Main conclusions The AIC was relevant for variable selection in the presence of RSA if the independent variables considered were not spatially autocorrelated. It otherwise failed because highly spatially autocorrelated variables were more often selected than others. Conversely, SLOO had similar performances whether the variables were themselves spatially autocorrelated or not. It was particularly useful when the range of RSA was small, which is a common property of spatial tools. SLOO appears to be a promising solution for selecting relevant variables from most ecological spatial datasets.

[1]  Jennifer L. Dungan,et al.  A balanced view of scale in spatial statistical analysis , 2002 .

[2]  Ben Raymond,et al.  Spatial and seasonal distribution of adult Oithona similis in the Southern Ocean: Predictions using boosted regression trees , 2010 .

[3]  R. Hijmans,et al.  Cross-validation of species distribution models: removing spatial sorting bias and calibration with a null model. , 2012, Ecology.

[4]  M. Stone Comments on Model Selection Criteria of Akaike and Schwarz , 1979 .

[5]  Christopher J Paciorek,et al.  The importance of scale for spatial-confounding bias and precision of spatial regression estimators. , 2010, Statistical science : a review journal of the Institute of Mathematical Statistics.

[6]  P. Legendre Spatial Autocorrelation: Trouble or New Paradigm? , 1993 .

[7]  J. L. Parra,et al.  Very high resolution interpolated climate surfaces for global land areas , 2005 .

[8]  Daniel A. Griffith,et al.  Efficiency of least squares estimators in the presence of spatial autocorrelation , 1993 .

[9]  Jerald B. Johnson,et al.  Model selection in ecology and evolution. , 2004, Trends in ecology & evolution.

[10]  B. McGill,et al.  Testing the predictive performance of distribution models , 2013 .

[11]  A. Brenning Spatial prediction models for landslide hazards: review, comparison and evaluation , 2005 .

[12]  E. George The Variable Selection Problem , 2000 .

[13]  George J Knafl,et al.  Factor analysis model evaluation through likelihood cross-validation , 2007, Statistical methods in medical research.

[14]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[15]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[16]  F. Gosselin,et al.  Comparison of regression methods for spatially‐autocorrelated count data on regularly‐ and irregularly‐spaced locations , 2014 .

[17]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[18]  Edmond Chow,et al.  A cross-validatory method for dependent data , 1994 .

[19]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[20]  José Alexandre Felizola Diniz-Filho,et al.  Model selection and information theory in geographical ecology , 2008 .

[21]  Alexander Brenning,et al.  Data Mining in Precision Agriculture: Management of Spatial Information , 2010, IPMU.

[22]  José Alexandre Felizola Diniz-Filho,et al.  Spatial autocorrelation, model selection and hypothesis testing in geographical ecology: Implications for testing metabolic theory in New World amphibians , 2007 .

[23]  Vincent Bretagnolle,et al.  Accounting for spatial autocorrelation from model selection to statistical inference: Application to a national survey of a diurnal raptor , 2013, Ecol. Informatics.

[24]  Murali Haran,et al.  Dimension reduction and alleviation of confounding for spatial generalized linear mixed models , 2010, 1011.6649.

[25]  Koenig,et al.  Spatial autocorrelation of ecological phenomena. , 1999, Trends in ecology & evolution.

[26]  Colin L. Mallows,et al.  Some Comments on Cp , 2000, Technometrics.

[27]  Richard Field,et al.  Coefficient shifts in geographical ecology: an empirical evaluation of spatial and non-spatial regression , 2009 .

[28]  J. Diniz‐Filho,et al.  Spatial autocorrelation and red herrings in geographical ecology , 2003 .

[29]  Jack J. Lennon,et al.  Red-shifts and red herrings in geographical ecology , 2000 .

[30]  Naomi Altman,et al.  Kernel Smoothing of Data with Correlated Errors , 1990 .

[31]  J. Marron,et al.  Comparison of Two Bandwidth Selectors with Dependent Errors , 1991 .

[32]  R. G. Davies,et al.  Methods to account for spatial autocorrelation in the analysis of species distributional data : a review , 2007 .

[33]  V. Zadnik,et al.  Effects of Residual Smoothing on the Posterior of the Fixed Effects in Disease‐Mapping Models , 2006, Biometrics.

[34]  Ingolf Kühn,et al.  Combining spatial and phylogenetic eigenvector filtering in trait analysis , 2009 .

[35]  M. Stone An Asymptotic Equivalence of Choice of Model by Cross‐Validation and Akaike's Criterion , 1977 .

[36]  Andrea G. Fabbri,et al.  Validation of Spatial Prediction Models for Landslide Hazard Mapping , 2003 .

[37]  J. Hodges,et al.  Adding Spatially-Correlated Errors Can Mess Up the Fixed Effect You Love , 2010 .

[38]  C. L. Mallows Some comments on C_p , 1973 .

[39]  Manuela M. P. Huso,et al.  Comment on ''Methods to account for spatial autocorrelation in the analysis of species distributional data: a review'' , 2009 .

[40]  C. Mallows More comments on C p , 1995 .

[41]  Andrew M. Liebhold,et al.  Testing for correlation in the presence of spatial autocorrelation in insect count data , 1998 .

[42]  Seymour Geisser,et al.  The Predictive Sample Reuse Method with Applications , 1975 .

[43]  T. Bailey Spatial Analysis: A Guide for Ecologists , 2006 .

[44]  P. Dixon,et al.  Accounting for Spatial Pattern When Modeling Organism- Environment Interactions , 2022 .

[45]  C. Mallows Some Comments on Cp , 2000, Technometrics.

[46]  Jennifer A Hoeting,et al.  Model selection for geostatistical models. , 2006, Ecological applications : a publication of the Ecological Society of America.

[47]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[48]  J. Shao AN ASYMPTOTIC THEORY FOR LINEAR MODEL SELECTION , 1997 .

[49]  C. Dormann Effects of incorporating spatial autocorrelation into the analysis of species distribution data , 2007 .

[50]  David M. Allen,et al.  The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction , 1974 .

[51]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[52]  M. Fortin,et al.  Spatial pattern and ecological analysis , 1989, Vegetatio.

[53]  Colin M Beale,et al.  Regression analysis of spatial data. , 2010, Ecology letters.

[54]  David J. Mulla,et al.  Geostatistical Tools for Modeling and Interpreting Ecological Spatial Dependence , 1992 .

[55]  T. Simons,et al.  Spatial autocorrelation and autoregressive models in ecology , 2002 .

[56]  William B. Krohn,et al.  Importance of spatial autocorrelation in modeling bird distributions at a continental scale , 2006 .

[57]  Daniel A Griffith,et al.  Spatial modeling in ecology: the flexibility of eigenfunction spatial analyses. , 2006, Ecology.