A method for measuring the relative information content of data from different monitoring protocols

Summary 1. Species monitoring is an essential component of assessing conservation status, predicting effects of habitat change and establishing management and conservation priorities. The pervasive access to the Internet has led to the development of several extensive monitoring projects that engage massive networks of volunteers who provide observations following relatively unstructured protocols. However, the value of these data is largely unknown. 2. We develop a novel cross-data validation method for measuring the value of survey data from one source (e.g. an Internet checklist program) relative to a second, benchmark data source. The method fits a model to the data of interest and validates the model using benchmark data, allowing us to isolate the training data's information content from its biases. We also define a data efficiency ratio to quantify the relative efficiency of the data sources. 3. We apply our cross-data validation method to quantify the value of data collected in eBird – a western hemisphere, year-round citizen science bird checklist project – relative to data from the highly standardized North American Breeding Bird Survey (BBS). The results show that eBird data contain information similar in quality to that in BBS data, while the information per BBS datum is higher. 4. We suggest that these methods have more general use in evaluating the suitability of sources of data for addressing specific questions for taxa of interest.

[1]  Aaron M Ellison,et al.  Observer bias and the detection of low-density populations. , 2009, Ecological applications : a publication of the Ecological Society of America.

[2]  Rich Caruana,et al.  Data mining in metric space: an empirical analysis of supervised learning performance criteria , 2004, ROCAI.

[3]  Jonathan Bart,et al.  Reliability of the Breeding Bird Survey: Effects of restricting surveys to roads , 1995 .

[4]  Brian L. Sullivan,et al.  eBird: A citizen-based bird observation network in the biological sciences , 2009 .

[5]  G. De’ath,et al.  CLASSIFICATION AND REGRESSION TREES: A POWERFUL YET SIMPLE TECHNIQUE FOR ECOLOGICAL DATA ANALYSIS , 2000 .

[6]  D. MacKenzie Modeling the Probability of Resource Use: The Effect of, and Dealing with, Detecting a Species Imperfectly , 2006 .

[7]  WESLEY M. HOCHACHKA,et al.  Data-Mining Discovery of Pattern and Process in Ecological Systems , 2007 .

[8]  J. Nichols,et al.  Monitoring for conservation. , 2006, Trends in ecology & evolution.

[9]  Wray L. Buntine,et al.  Learning classification trees , 1992 .

[10]  Falk Huettmann,et al.  Current State of the Art for Statistical Modelling of Species Distributions , 2010 .

[11]  David B. Roy,et al.  A northward shift of range margins in British Odonata , 2005 .

[12]  Denis Couvet,et al.  Thermal range predicts bird population resilience to extreme high temperatures. , 2006, Ecology letters.

[13]  Stephen R. Baillie,et al.  Migration Watch: an Internet survey to monitor spring migration in Britain and Ireland , 2006, Journal of Ornithology.

[14]  W. Hochachka,et al.  Density-dependent decline of host abundance resulting from a new infectious disease. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[15]  C. S. Robbins,et al.  The Breeding Bird Survey: Its First Fifteen Years, 1965-1979 , 1987 .

[16]  G. J. Niemi,et al.  A comparison of on- and off-road bird counts: Do you need to go off road to count birds accurately? , 1995 .

[17]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[18]  Steve Kelling,et al.  Mining citizen science data to predict orevalence of wild bird species , 2006, KDD '06.

[19]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[20]  J. Andrew Royle,et al.  Site‐Occupancy Distribution Modeling to Correct Population‐Trend Estimates Derived from Opportunistic Observations , 2010, Conservation biology : the journal of the Society for Conservation Biology.

[21]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[22]  W. Koenig,et al.  SPATIAL AUTOCORRELATION AND LOCAL DISAPPEARANCES IN WINTERING NORTH AMERICAN BIRDS , 2001 .

[23]  Wesley M. Hochachka,et al.  Sources of Variation in Singing Probability of Florida Grasshopper Sparrows, and Implications for Design and Analysis of Auditory Surveys , 2009 .

[24]  Les G. Underhill,et al.  The seminal legacy of the Southern African Bird Atlas Project , 2008 .

[25]  John Bell,et al.  A review of methods for the assessment of prediction errors in conservation presence/absence models , 1997, Environmental Conservation.

[26]  C. Thomas,et al.  Birds extend their ranges northwards , 1999, Nature.

[27]  A. Townsend Peterson,et al.  Novel methods improve prediction of species' distributions from occurrence data , 2006 .

[28]  M. Knutson,et al.  Scaling Local Species-habitat Relations to the Larger Landscape with a Hierarchical Spatial Count Model , 2007, Landscape Ecology.

[29]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[30]  Steve Kelling,et al.  Data-Intensive Science: A New Paradigm for Biodiversity Studies , 2009 .