Can we model the probability of presence of species without absence data

In ecological studies, it is useful to estimate the probability that a species occurs at given locations. The probability of presence can be modeled by traditional statistical methods, if both presence and absence data are available. However, the challenge is that most species records contain only presence data, without reliable absence data. Previous presence-only methods can estimate a relative index of habitat suitability, but cannot estimate the actual probability of presence. In this study, we develop a presence and background learning algorithm (PBL) that is successful in modeling the conditional probability of presence of a simulated species. The model is trained by two completely separate sets: observed presence and background data. Assuming that the probability of presence is one for ‘prototypical presence’ locations where the habitats are maximally suitable for a species, we can estimate a constant that can calibrate the trained model into the actual probability of presence. Experimental results show that the PBL method performs similarly to a presence-absence method, and significantly better than the widely used maximum entropy method. The new algorithm enables us to model the probability that a species occurs conditional on environmental covariates without absence data. Hence, it has potential to improve modeling of the geographical distributions of species.

[1]  N. M. Kelly,et al.  MODELING SEAGRASS LANDSCAPE PATTERN AND ASSOCIATED ECOLOGICAL ATTRIBUTES , 2002 .

[2]  Miroslav Dudík,et al.  Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation , 2008 .

[3]  L. Sloan,et al.  Modeled regional climate change and California endemic oak ranges. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[5]  S. Manel,et al.  Alternative methods for predicting species distribution: an illustration with Himalayan river birds , 1999 .

[6]  Alberto Jiménez-Valverde,et al.  The uncertain nature of absences and their importance in species distribution modelling , 2010 .

[7]  Vittorio Castelli,et al.  The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter , 1996, IEEE Trans. Inf. Theory.

[8]  Mark S. Boyce,et al.  Modelling distribution and abundance with presence‐only data , 2006 .

[9]  Trevor Hastie,et al.  Generalized linear and generalized additive models in studies of species distributions: setting the scene , 2002 .

[10]  A. Peterson,et al.  Prediction of bird community composition based on point‐occurrence data and inferential algorithms: a valuable tool in biodiversity assessments , 2002 .

[11]  A. Peterson,et al.  Evidence of climatic niche shift during biological invasion. , 2007, Ecology letters.

[12]  A. Townsend Peterson,et al.  Novel methods improve prediction of species' distributions from occurrence data , 2006 .

[13]  A. Peterson,et al.  New developments in museum-based informatics and applications in biodiversity analysis. , 2004, Trends in ecology & evolution.

[14]  H. Pulliam On the relationship between niche and distribution , 2000 .

[15]  S. Cherry,et al.  USE AND INTERPRETATION OF LOGISTIC REGRESSION IN HABITAT-SELECTION STUDIES , 2004 .

[16]  A. Hirzel,et al.  Assessing habitat-suitability models with a virtual species , 2001 .

[17]  M. Araújo,et al.  An evaluation of methods for modelling species distributions , 2004 .

[18]  A. Townsend Peterson,et al.  Niche differentiation in Mexican birds: using point occurrences to detect ecological innovation , 2003 .

[19]  G. Carpenter,et al.  DOMAIN: a flexible modelling procedure for mapping potential distributions of plants and animals , 1993, Biodiversity & Conservation.

[20]  G. De’ath,et al.  Environmental factors associated with the spatial distribution of crustose coralline algae on the Great Barrier Reef , 2001, Coral Reefs.

[21]  J. Heckman Sample selection bias as a specification error , 1979 .

[22]  Daniel P. Faith,et al.  Practical application of biodiversity surrogates and percentage targets for conservation in Papua New Guinea , 2000 .

[23]  Hecht-Nielsen Theory of the backpropagation neural network , 1989 .

[24]  S. Sarkar,et al.  Systematic conservation planning , 2000, Nature.

[25]  David R. B. Stockwell,et al.  The GARP modelling system: problems and solutions to automated spatial prediction , 1999, Int. J. Geogr. Inf. Sci..

[26]  J. Elith,et al.  Do they? How do they? WHY do they differ? On finding reasons for differing performances of species distribution models , 2009 .

[27]  Charles Elkan,et al.  Making generative classifiers robust to selection bias , 2007, KDD '07.

[28]  Rosa M. Chefaoui,et al.  Potential distribution modelling, niche characterization and conservation status assessment using GIS tools: A case study of Iberian Copris species , 2005 .

[29]  Steven J. Phillips,et al.  Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. , 2009, Ecological applications : a publication of the Ecological Society of America.

[30]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[31]  W. Thuiller,et al.  Predicting species distribution: offering more than simple habitat models. , 2005, Ecology letters.

[32]  Sung-Hyuk Cha Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions , 2007 .

[33]  S. Schneider,et al.  Fingerprints of global warming on wild animals and plants , 2003, Nature.

[34]  S. Lek,et al.  Environmental impact prediction using neural network modelling. An example in wildlife damage , 1999 .

[35]  G. Imbens,et al.  Case-control studies with contaminated controls☆ , 1996 .

[36]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[37]  T. Dawson,et al.  Selecting thresholds of occurrence in the prediction of species distributions , 2005 .

[38]  D. Chessel,et al.  ECOLOGICAL-NICHE FACTOR ANALYSIS: HOW TO COMPUTE HABITAT-SUITABILITY MAPS WITHOUT ABSENCE DATA? , 2002 .

[39]  Qinghua Guo,et al.  The point-radius method for georeferencing locality descriptions and calculating associated uncertainty , 2004, Int. J. Geogr. Inf. Sci..

[40]  Maggi Kelly,et al.  Support vector machines for predicting distribution of Sudden Oak Death in California , 2005 .

[41]  G. Midgley,et al.  Do geographic distribution, niche property and life form explain plants' vulnerability to global change? , 2006 .

[42]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[43]  O. Phillips,et al.  Extinction risk from climate change , 2004, Nature.

[44]  D. Richardson,et al.  Niche‐based modelling as a tool for predicting the risk of alien plant invasions at a global scale , 2005, Global change biology.

[45]  Yu Liu,et al.  ModEco: an integrated software package for ecological niche modeling , 2010 .

[46]  T. Hastie,et al.  Presence‐Only Data and the EM Algorithm , 2009, Biometrics.

[47]  J. Busby A biogeoclimatic analysis of Nothofagus cunninghamii (Hook.) Oerst. in southeastern Australia , 1986 .

[48]  Antoine Guisan,et al.  Prediction of plant species distributions across six millennia. , 2008, Ecology letters.