Modelling distribution and abundance with presence‐only data

Summary 1 Presence-only data, for which there is no information on locations where the species is absent, are common in both animal and plant studies. In many situations, these may be the only data available on a species. We need effective ways to use these data to explore species distribution or species use of habitat. 2 Many analytical approaches have been used to model presence-only data, some inappropriately. We provide a synthesis and critique of statistical methods currently in use to both estimate and evaluate these models, and discuss the critical importance of study design in models where only presence can be identified 3 Profile or envelope methods exist to characterize environmental covariates that describe the locations where organisms are found. Predictions from profile approaches are generally coarse, but may be useful when species records, environmental predictors and biological understanding are scarce. 4 Alternatively, one can build models to contrast environmental attributes associated with known locations with a sample of random landscape locations, termed either ‘pseudo-absences’ or ‘available’. Great care needs to be taken when selecting random landscape locations, because the way in which they are selected determines the modelling techniques that can be applied. 5 Regression-based models can provide predictions of the relative likelihood of occurrence, and in some situations predictions of the probability of occurrence. The logistic model is frequently applied, but can rarely be used directly to estimate these models; instead, case–control or logistic discrimination should be used depending on the sample design. 6 Cross-validation can be used to evaluate model performance and to assess how effectively the model reflects a quantity proportional to the probability of occurrence. However, more research is needed to develop a single measure or statistic that summarizes model performance for presence-only data. 7 Synthesis and applications. A number of statistical procedures are available to explore patterns in presence-only data; the choice among them depends on the quality of the presence-only data. Presence-only records can provide insight into the vulnerability, historical distribution and conservation status of species. Models developed using these data can inform management. Our caveat is that researchers must be mindful of study design and the biases inherent in presence data, and be cautious in the interpretation of model predictions.

[1]  David B. Lindenmayer,et al.  The conservation implications of bird reproduction in the agricultural matrix: a case study of the vulnerable superb parrot of south-eastern Australia , 2004 .

[2]  J. Busby BIOCLIM - a bioclimate analysis and prediction system , 1991 .

[3]  Monica G. Turner,et al.  Scale and heterogeneity in habitat selection by elk in Yellowstone National Park , 2003 .

[4]  A. Lehmann,et al.  Predicting species spatial distributions using presence-only data: a case study of native New Zealand ferns , 2002 .

[5]  S. Cherry,et al.  USE AND INTERPRETATION OF LOGISTIC REGRESSION IN HABITAT-SELECTION STUDIES , 2004 .

[6]  Maggi Kelly,et al.  Support vector machines for predicting distribution of Sudden Oak Death in California , 2005 .

[7]  M. McCarthy,et al.  Species conservation and management : case studies , 2004 .

[8]  A. Welsh,et al.  Generalized additive modelling and zero inflated count data , 2002 .

[9]  E. J. Milner-Gulland,et al.  Species Conservation and Management: Case Studies , 2004 .

[10]  Mark S. Boyce,et al.  A quantitative approach to conservation planning: using resource selection functions to map the distribution of mountain caribou at multiple spatial scales , 2004 .

[11]  Bradley Law,et al.  Climatic limitation of the southern distribution of the common blossom bat Syconycteris australis in New South Wales , 1994 .

[12]  J. Rhymer,et al.  HABITAT SELECTION BY WOOD TURTLES (CLEMMYS INSCULPTA): AN APPLICATION OF PAIRED LOGISTIC REGRESSION , 2002 .

[13]  W. Newey,et al.  A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelationconsistent Covariance Matrix , 1986 .

[14]  R. Dietz,et al.  Seasonal narwhal habitat associations in the high Arctic , 2003 .

[15]  S. Ferrier,et al.  Extended statistical approaches to modelling spatial pattern in biodiversity in northeast New South Wales. I. Species-level modelling , 2004, Biodiversity & Conservation.

[16]  Douglas H. Johnson THE COMPARISON OF USAGE AND AVAILABILITY MEASUREMENTS FOR EVALUATING RESOURCE PREFERENCE , 1980 .

[17]  A. Peterson,et al.  INTERPRETATION OF MODELS OF FUNDAMENTAL ECOLOGICAL NICHES AND SPECIES' DISTRIBUTIONAL AREAS , 2005 .

[18]  A. Peterson,et al.  New developments in museum-based informatics and applications in biodiversity analysis. , 2004, Trends in ecology & evolution.

[19]  Bryan F. J. Manly,et al.  Resource Selection by Animals , 1993, Springer Netherlands.

[20]  Chris J. Johnson,et al.  A MULTISCALE BEHAVIORAL APPROACH TO UNDERSTANDING THE MOVEMENTS OF WOODLAND CARIBOU , 2002 .

[21]  M. Araújo,et al.  Presence-absence versus presence-only modelling methods for predicting bird habitat suitability , 2004 .

[22]  David R. B. Stockwell,et al.  The GARP modelling system: problems and solutions to automated spatial prediction , 1999, Int. J. Geogr. Inf. Sci..

[23]  P. Walker,et al.  HABITAT : a procedure for modelling a disjoint environmental envelope for a plant or animal species , 1991 .

[24]  Bryan F. J. Manly,et al.  Assessing habitat selection when availability changes , 1996 .

[25]  David B. Lindenmayer,et al.  Bioclimatic Analysis to Enhance Reintroduction Biology of the Endangered Helmeted Honeyeater (Lichenostomus melanops cassidix) in Southeastern Australia , 1998 .

[26]  Antoine Guisan,et al.  Predictive habitat distribution models in ecology , 2000 .

[27]  David B. Lindenmayer,et al.  The conservation of Leadbeater's possum, Gymnobelideus leadbeateri (McCoy): a case study of the use of bioclimatic modelling , 1991 .

[28]  Trent L. McDonald,et al.  A new ecological risk assessment procedure using resource selection models and geographic information systems , 2002 .

[29]  G. Seber Multivariate observations / G.A.F. Seber , 1983 .

[30]  S. T. Buckland,et al.  An autologistic model for the spatial distribution of wildlife , 1996 .

[31]  M. Boyce,et al.  Relating populations to habitats using resource selection functions. , 1999, Trends in ecology & evolution.

[32]  M. Boyce,et al.  Evaluating resource selection functions , 2002 .

[33]  D. Chessel,et al.  ECOLOGICAL-NICHE FACTOR ANALYSIS: HOW TO COMPUTE HABITAT-SUITABILITY MAPS WITHOUT ABSENCE DATA? , 2002 .

[34]  Michael Drielsma,et al.  Extended statistical approaches to modelling spatial pattern in biodiversity in northeast New South Wales. II. Community-level modelling , 2002, Biodiversity & Conservation.

[35]  D. Lindenmayer,et al.  Modelling the abundance of rare species: statistical models for counts with extra zeros , 1996 .

[36]  Scott E. Nielsen,et al.  Can models of presence‐absence be used to scale abundance? Two case studies considering extremes in life history , 2005 .

[37]  A. Hirzel,et al.  Modelling habitat‐suitability using museum collections: an example with three sympatric Apodemus species from the Alps , 2003 .

[38]  Chris J. Johnson,et al.  Resource Selection Functions Based on Use–Availability Data: Theoretical Motivation and Evaluation Methods , 2006 .

[39]  G. Imbens,et al.  Case-control studies with contaminated controls☆ , 1996 .

[40]  R. Löfstrand,et al.  Modeling Habitat Suitability for Moose in Coastal Northern Sweden: Empirical vs Process-oriented Approaches , 2003, Ambio.

[41]  W. V. Winkle COMPARISON OF SEVERAL PROBABILISTIC HOME-RANGE MODELS' , 1975 .

[42]  A. Hirzel,et al.  Assessing habitat-suitability models with a virtual species , 2001 .

[43]  W. Newey,et al.  A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelationconsistent Covariance Matrix , 1986 .

[44]  Miguel B. Araújo,et al.  Selecting areas for species persistence using occurrence data , 2000 .

[45]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[46]  D. Collett,et al.  Modelling Binary Data. , 1994 .

[47]  G. Carpenter,et al.  DOMAIN: a flexible modelling procedure for mapping potential distributions of plants and animals , 1993, Biodiversity & Conservation.

[48]  Rosa M. Chefaoui,et al.  Potential distribution modelling, niche characterization and conservation status assessment using GIS tools: A case study of Iberian Copris species , 2005 .

[49]  J. Busby A biogeoclimatic analysis of Nothofagus cunninghamii (Hook.) Oerst. in southeastern Australia , 1986 .

[50]  Martin Kent,et al.  Vegetation Description and Analysis: A Practical Approach , 1992 .

[51]  Gordon B. Stenhouse,et al.  Removing GPS collar bias in habitat selection studies , 2004 .

[52]  Joshua J. Millspaugh,et al.  RELATING RESOURCES TO A PROBABILISTIC MEASURE OF SPACE USE: FOREST FRAGMENTS AND STELLER'S JAYS , 2004 .

[53]  Carsten Rahbek,et al.  Known and predicted African winter distributions and habitat use of the endangered Basra reed warbler (Acrocephalus griseldis) and the near-threatened cinereous bunting (Emberiza cineracea) , 2004, Journal of Ornithology.

[54]  G. Stenhouse,et al.  Modeling grizzly bear habitats in the Yellowhead ecosystem of Alberta: taking autocorrelation seriously , 2002 .

[55]  Stefan Dullinger,et al.  Habitat distribution models, spatial autocorrelation, functional traits and dispersal capacity of alpine plant species , 2004 .

[56]  Gordon C. Grigg,et al.  KANGAROOS AND CLIMATE: AN ANALYSIS OF DISTRIBUTION , 1987 .