Optimizations for the EcoPod field identification tool

BackgroundWe sketch our species identification tool for palm sized computers that helps knowledgeable observers with census activities. An algorithm turns an identification matrix into a minimal length series of questions that guide the operator towards identification. Historic observation data from the census geographic area helps minimize question volume. We explore how much historic data is required to boost performance, and whether the use of history negatively impacts identification of rare species. We also explore how characteristics of the matrix interact with the algorithm, and how best to predict the probability of observing a previously unseen species.ResultsPoint counts of birds taken at Stanford University's Jasper Ridge Biological Preserve between 2000 and 2005 were used to examine the algorithm. A computer identified species by correctly answering, and counting the algorithm's questions. We also explored how the character density of the key matrix and the theoretical minimum number of questions for each bird in the matrix influenced the algorithm. Our investigation of the required probability smoothing determined whether Laplace smoothing of observation probabilities was sufficient, or whether the more complex Good-Turing technique is required.ConclusionHistoric data improved identification speed, but only impacted the top 25% most frequently observed birds. For rare birds the history based algorithms did not impose a noticeable penalty in the number of questions required for identification. For our dataset neither age of the historic data, nor the number of observation years impacted the algorithm. Density of characters for different taxa in the identification matrix did not impact the algorithms. Intrinsic differences in identifying different birds did affect the algorithm, but the differences affected the baseline method of not using historic data to exactly the same degree. We found that Laplace smoothing performed better for rare species than Simple Good-Turing, and that, contrary to expectation, the technique did not then adversely affect identification performance for frequently observed birds.

[1]  M. J. Dallwitz,et al.  A Flexible Computer Program for Generating Identification Keys , 1974 .

[2]  G. Powell,et al.  Conservation Biology for the Biodiversity Crisis , 2002, Conservation biology : the journal of the Society for Conservation Biology.

[3]  Gretchen C Daily,et al.  Alleviating spatial conflict between people and biodiversity , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Peter Norvig,et al.  Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.

[5]  Donald V. Osborne,et al.  SOME ASPECTS OF THE THEORY OF DICHOTOMOUS KEYS , 1963 .

[6]  T. A. Paine,et al.  User's guide to the Delta system: a general system for processing taxonomic descriptions , 1993 .

[7]  William A. Gale,et al.  Good-Turing Frequency Estimation Without Tears , 1995, J. Quant. Linguistics.

[8]  Andreas Paepcke,et al.  EcoPod: a mobile tool for community based biodiversity collection building , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[9]  Jeremy J. D. Greenwood,et al.  Monitoring terrestrial mammals in the UK: past, present and future, using lessons from the bird world , 2004 .

[10]  R. Stevenson,et al.  Electronic Field Guides and User Communities in the Eco-informatics Revolution , 2003 .

[11]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[12]  Ahmad M. Ahmad Wasfi Collecting user access patterns for building user profiles and collaborative filtering , 1998, IUI '99.

[13]  David W. Macdonald,et al.  Validating mammal monitoring methods and assessing the performance of volunteers in wildlife conservation—“Sed quis custodiet ipsos custodies ?” , 2003 .

[14]  R. Primack,et al.  CLIMATE CHANGE AS REFLECTED IN A NATURALIST'S DIARY, MIDDLEBOROUGH, MASSACHUSETTS , 2004 .

[15]  K. Gaston Global patterns in biodiversity , 2000, Nature.

[16]  Population fluctuations of the monarch (Danaus plexippus) in the 4th of July butterfly count 1977-1994 , 1995 .

[17]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[18]  John F. McLaughlin,et al.  Climate change hastens population extinctions , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[19]  L. E. Morse SPECIMEN IDENTIFICATION AND KEY CONSTRUCTION WITH TIME‐SHARING COMPUTERS , 1971 .

[20]  G. Daily,et al.  Population diversity: its extent and extinction. , 1997, Science.

[21]  T. A. Paine,et al.  Delta user's guide: a general system for processing taxonomic descriptions. , 1993 .

[22]  S. Post Christmas Bird Count , 2009 .

[23]  Kristian J. Hammond,et al.  Mining navigation history for recommendation , 2000, IUI '00.

[24]  Henry Lieberman,et al.  A zero-input interface for leveraging group experience in web browsing , 2003, IUI '03.

[25]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[26]  Gs. LeBaron The 88th Christmas Bird Count , 1988 .