Point Pattern Modeling for Degraded Presence-Only Data over Large Regions

Explaining the distribution of a species using local environmental features is a long standing ecological problem. Often, available data is collected as a set of presence locations only thus precluding the possibility of a desired presence-absence analysis. We propose that it is natural to view presence-only data as a point pattern over a region and to use local environmental features to explain the intensity driving this point pattern. We use a hierarchical model to treat the presence data as a realization of a spatial point process, whose intensity is governed by the set of environmental covariates. Spatial dependence in the intensity levels is modeled with random effects involving a zero mean Gaussian process. We augment the model to capture highly variable and typically sparse sampling effort as well as land transformation, both of which degrade the point pattern. The Cape Floristic Region (CFR) in South Africa provides an extensive class of such species data. The potential (i.e., nondegraded) presence surfaces over the entire area are of interest from a conservation and policy perspective. The region is divided into ∼ 37, 000 grid cells. To work with a Gaussian process over a very large number of cells we use predictive spatial process approximation. Bias correction by adding a heteroscedastic error component has also been implemented. We illustrate with modeling for six of different species. Also, comparison is made with the now popular Maxent approach though the latter is limited with regard to inference. The resultant patterns are important on their own but also enable a comparative view, for example, to investigate whether a pair of species are potentially competing in the same area. An additional feature of our modeling is the opportunity to infer about biodiversity through species richness, i.e., the number of distinct species in an areal unit. such investigation immediately follows within our modeling framework.

[1]  D. Warton,et al.  Correction note: Poisson point process models solve the “pseudo-absence problem” for presence-only data in ecology , 2010, 1011.3319.

[2]  A. Gelfand,et al.  Modeling large scale species abundance with latent spatial processes , 2010, 1011.3327.

[3]  Jennifer A. Hoeting,et al.  A clipped latent variable model for spatially correlated ordered categorical data , 2010, Comput. Stat. Data Anal..

[4]  Sam Veloz,et al.  Spatially autocorrelated sampling falsely inflates measures of accuracy for presence‐only niche models , 2009 .

[5]  Tim Newbold,et al.  Climate‐based models of spatial patterns of species richness in Egypt’s butterfly and mammal fauna , 2009 .

[6]  Andrew O. Finley,et al.  Improving the performance of predictive process modeling for large datasets , 2009, Comput. Stat. Data Anal..

[7]  T. Hastie,et al.  Presence‐Only Data and the EM Algorithm , 2009, Biometrics.

[8]  A. Gelfand,et al.  Gaussian predictive process models for large spatial data sets , 2008, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[9]  A. Peterson,et al.  Effects of sample size on the performance of species distribution models , 2008 .

[10]  J Elith,et al.  A working guide to boosted regression trees. , 2008, The Journal of animal ecology.

[11]  Miroslav Dudík,et al.  Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation , 2008 .

[12]  J. Lobo,et al.  How does the knowledge about the spatial distribution of Iberian dung beetle species accumulate over time? , 2007 .

[13]  M. Schervish,et al.  On posterior consistency in nonparametric regression problems , 2007 .

[14]  Kalle Ruokolainen,et al.  Analysing botanical collecting effort in Amazonia and correcting for it in species range estimation , 2007 .

[15]  Timothy C. Coburn,et al.  Hierarchical Modeling and Analysis for Spatial Data , 2007 .

[16]  M. Sykes,et al.  Methods and uncertainties in bioclimatic envelope modelling under climate change , 2006 .

[17]  L. Belbin,et al.  Evaluation of statistical models used for predicting plant species distributions: Role of artificial data and theory , 2006 .

[18]  Catherine H. Graham,et al.  A comparison of methods for mapping species ranges and species richness , 2006 .

[19]  S. Ghosal,et al.  Posterior consistency of Gaussian process prior for nonparametric binary regression , 2006, math/0702686.

[20]  P. Hernandez,et al.  The effect of sample size and species characteristics on performance of different species distribution modeling methods , 2006 .

[21]  Mark S. Boyce,et al.  Modelling distribution and abundance with presence‐only data , 2006 .

[22]  A. Townsend Peterson,et al.  Novel methods improve prediction of species' distributions from occurrence data , 2006 .

[23]  A. Gelfand,et al.  Explaining Species Distribution Patterns through Hierarchical Modeling , 2006 .

[24]  Shanshan Wu,et al.  Building statistical models to analyze species distributions. , 2006, Ecological applications : a publication of the Ecological Society of America.

[25]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[26]  Miroslav Dudík,et al.  Correcting sample selection bias in maximum entropy density estimation , 2005, NIPS.

[27]  Jürgen Symanzik,et al.  Statistical Analysis of Spatial Point Patterns , 2005, Technometrics.

[28]  W. Thuiller,et al.  Predicting species distribution: offering more than simple habitat models. , 2005, Ecology letters.

[29]  J. Hoeting,et al.  FACTORS AFFECTING SPECIES DISTRIBUTION PREDICTIONS: A SIMULATION MODELING EXPERIMENT , 2005 .

[30]  A. Gelfand,et al.  Modelling species diversity through species level hierarchical modelling , 2005 .

[31]  A. Peterson,et al.  New developments in museum-based informatics and applications in biodiversity analysis. , 2004, Trends in ecology & evolution.

[32]  A. Guisan,et al.  An improved approach for predicting the distribution of rare and endangered species from occurrence and pseudo-absence data , 2004 .

[33]  H. Possingham,et al.  IMPROVING PRECISION AND REDUCING BIAS IN BIOLOGICAL SURVEYS: ESTIMATING FALSE‐NEGATIVE ERROR RATES , 2003 .

[34]  Ian Phillip Vaughan,et al.  Improving the Quality of Distribution Models for Conservation by Addressing Shortcomings in the Field Collection of Training Data , 2003 .

[35]  J. Møller,et al.  Statistical Inference and Simulation for Spatial Point Processes , 2003 .

[36]  Mevin B. Hooten,et al.  Predicting the spatial distribution of ground flora on large domains using a hierarchical Bayesian model , 2003, Landscape Ecology.

[37]  A. Townsend Peterson,et al.  New distributional modelling approaches for gap analysis , 2003 .

[38]  Trevor Hastie,et al.  Generalized linear and generalized additive models in studies of species distributions: setting the scene , 2002 .

[39]  Antoine Guisan,et al.  Predictive habitat distribution models in ecology , 2000 .

[40]  van Marie-Colette Lieshout,et al.  Markov Point Processes and Their Applications , 2000 .

[41]  A. V. D. Vaart,et al.  Convergence rates of posterior distributions , 2000 .

[42]  David R. B. Stockwell,et al.  The GARP modelling system: problems and solutions to automated spatial prediction , 1999, Int. J. Geogr. Inf. Sci..

[43]  I. Dimopoulos,et al.  Application of neural networks to modelling nonlinear relationships in ecology , 1996 .

[44]  S. T. Buckland,et al.  An autologistic model for the spatial distribution of wildlife , 1996 .

[45]  Steven J. Phillips,et al.  Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. , 2009, Ecological applications : a publication of the Ecological Society of America.

[46]  Natesh S. Pillai,et al.  A Note on Posterior Consistency of Nonparametric Poisson Regression Models , 2007 .

[47]  J. Ghosh,et al.  Posterior consistency of logistic Gaussian process priors in density estimation , 2007 .

[48]  J. Symanzik Statistical Analysis of Spatial Point Patterns (2nd ed.) , 2005 .

[49]  L. Breiman Random Forests , 2001, Machine Learning.

[50]  Haotian Hang,et al.  Inconsistent Estimation and Asymptotically Equal Interpolations in Model-Based Geostatistics , 2004 .

[51]  David L. Hawksworth,et al.  Biodiversity and Conservation , 2007, Biodiversity & Conservation.

[52]  D. Higdon Space and Space-Time Modeling using Process Convolutions , 2002 .

[53]  Noel A Cressie,et al.  Uncertainty and Spatial Linear Models for Ecological Data , 2001 .

[54]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .