Integrating multiple data sources in species distribution modeling: a framework for data fusion.

The last decade has seen a dramatic increase in the use of species distribution models (SDMs) to characterize patterns of species' occurrence and abundance. Efforts to parameterize SDMs often create a tension between the quality and quantity of data available to fit models. Estimation methods that integrate both standardized and non-standardized data types offer a potential solution to the tradeoff between data quality and quantity. Recently several authors have developed approaches for jointly modeling two sources of data (one of high quality and one of lesser quality). We extend their work by allowing for explicit spatial autocorrelation in occurrence and detection error using a Multivariate Conditional Autoregressive (MVCAR) model and develop three models that share information in a less direct manner resulting in more robust performance when the auxiliary data is of lesser quality. We describe these three new approaches ("Shared," "Correlation," "Covariates") for combining data sources and show their use in a case study of the Brown-headed Nuthatch in the Southeastern U.S. and through simulations. All three of the approaches which used the second data source improved out-of-sample predictions relative to a single data source ("Single"). When information in the second data source is of high quality, the Shared model performs the best, but the Correlation and Covariates model also perform well. When the information quality in the second data source is of lesser quality, the Correlation and Covariates model performed better suggesting they are robust alternatives when little is known about auxiliary data collected opportunistically or through citizen scientists. Methods that allow for both data types to be used will maximize the useful information available for estimating species distributions.

[1]  Robert M. Dorazio,et al.  Accounting for imperfect detection and survey bias in statistical analysis of presence‐only data , 2014 .

[2]  Krishna Pacifici,et al.  Occupancy estimation for rare species using a spatially‐adaptive sampling design , 2016 .

[3]  Otso Ovaskainen,et al.  Modeling species co-occurrence by multivariate logistic regression generates new hypotheses on fungal interactions. , 2010, Ecology.

[4]  J. Nichols,et al.  ESTIMATING SITE OCCUPANCY, COLONIZATION, AND LOCAL EXTINCTION WHEN A SPECIES IS DETECTED IMPERFECTLY , 2003 .

[5]  Steve Kelling,et al.  Data-intensive science applied to broad-scale citizen science. , 2012, Trends in ecology & evolution.

[6]  Jane Elith,et al.  On estimating probability of presence from use-availability or presence-background data. , 2013, Ecology.

[7]  J. Hodges,et al.  Adding Spatially-Correlated Errors Can Mess Up the Fixed Effect You Love , 2010 .

[8]  David N. Bonter,et al.  Citizen Science as an Ecological Research Tool: Challenges and Benefits , 2010 .

[9]  T. Hastie,et al.  Bias correction in species distribution models: pooling survey and collection data for multiple species , 2014, Methods in ecology and evolution.

[10]  Laura J. Pollock,et al.  Understanding co‐occurrence by modelling species simultaneously with a Joint Species Distribution Model (JSDM) , 2014 .

[11]  Bradley P. Carlin,et al.  Bayesian Methods for Data Analysis , 2008 .

[12]  J. Andrew Royle,et al.  Spatial Capture-Recapture , 2013 .

[13]  Mevin B. Hooten,et al.  Spatial occupancy models for large data sets , 2013 .

[14]  J. Andrew Royle,et al.  ESTIMATING SITE OCCUPANCY RATES WHEN DETECTION PROBABILITIES ARE LESS THAN ONE , 2002, Ecology.

[15]  Krishna Pacifici,et al.  Inferring habitat quality and habitat selection using static site occupancy models. , 2016 .

[16]  W. Thuiller,et al.  Predicting species distribution: offering more than simple habitat models. , 2005, Ecology letters.

[17]  Sw. Banerjee,et al.  Hierarchical Modeling and Analysis for Spatial Data , 2003 .

[18]  J. Andrew Royle,et al.  Presence‐only modelling using MAXENT: when can we trust the inferences? , 2013 .

[19]  D. Warton,et al.  Equivalence of MAXENT and Poisson Point Process Models for Species Distribution Modeling in Ecology , 2013, Biometrics.

[20]  Miroslav Dudík,et al.  Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation , 2008 .

[21]  S. Ferrier,et al.  An evaluation of alternative algorithms for fitting species distribution models using logistic regression , 2000 .

[22]  Trevor Hastie,et al.  Inference from presence-only data; the ongoing controversy. , 2013, Ecography.

[23]  J. Andrew Royle N‐Mixture Models for Estimating Population Size from Spatially Replicated Counts , 2004, Biometrics.

[24]  Brendan A. Wintle,et al.  Is my species distribution model fit for purpose? Matching data and models to applications , 2015 .

[25]  Richard B. Chandler,et al.  unmarked: An R Package for Fitting Hierarchical Models of Wildlife Occurrence and Abundance , 2011 .

[26]  J. Andrew Royle,et al.  Likelihood analysis of species occurrence probability from presence‐only data for modelling species distributions , 2012, Methods in Ecology and Evolution.

[27]  Damaris Zurell,et al.  The virtual ecologist approach: simulating data and observers , 2010 .

[28]  J. Andrew Royle,et al.  MODELING AVIAN ABUNDANCE FROM REPLICATED COUNTS USING BINOMIAL MIXTURE MODELS , 2005 .

[29]  Carolyn Huston,et al.  Hierarchical Bayesian strategy for modeling correlated compositional data with observed zero counts , 2012, Environmental and Ecological Statistics.

[30]  J Andrew Royle,et al.  Hierarchical distance-sampling models to estimate population size and habitat-specific abundance of an island endemic. , 2012, Ecological applications : a publication of the Ecological Society of America.

[31]  E. Blankenship,et al.  Correction of location errors for presence‐only species distribution models , 2014 .

[32]  Kai Zhu,et al.  More than the sum of the parts: forest climate response from joint species distribution models. , 2014, Ecological applications : a publication of the Ecological Society of America.

[33]  James E. Hines,et al.  Accounting for false positives improves estimates of occupancy from key informant interviews , 2014 .

[34]  Paul C. Cross,et al.  Linking process to pattern: estimating spatiotemporal dynamics of a wildlife epidemic from cross-sectional data , 2010 .

[35]  B. Manly,et al.  Resource selection by animals: statistical design and analysis for field studies. , 1994 .

[36]  James E. Hines,et al.  Determining Occurrence Dynamics when False Positives Occur: Estimating the Range Dynamics of Wolves from Public Survey Data , 2013, PloS one.

[37]  Robert M Dorazio,et al.  Predicting the Geographic Distribution of a Species from Presence‐Only Data Subject to Detection Errors , 2012, Biometrics.

[38]  Brian L. Sullivan,et al.  eBird: A citizen-based bird observation network in the biological sciences , 2009 .

[39]  Christophe Giraud,et al.  Capitalizing on opportunistic data for monitoring relative abundances of species , 2016, Biometrics.

[40]  S. Wood Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models , 2011 .

[41]  David R. Anderson,et al.  Statistical inference from capture data on closed animal populations , 1980 .