UvA-DARE (Digital Academic Repository) A semantically integrated, user-friendly data model for species observation data

Recent decades have seen an increasing importance of large-scale ecological research, driven by increased awareness of the global influence of human activities on the biosphere. Such research requires species observation data covering many years, large areas and a broad range of taxonomic groups. As such data sets often cover small areas, and have been collected using varying methods, they can only be combined in a single analysis if they are made available at the same location and translated into a single format. Over the past decade, catalysed by the growth of the Internet, various technologies for data dissemination and data integration have been developed and applied in projects such as the Global Biodiversity Information Facility, the Knowledge Network for Biocomplexity, BioCASE and the British National Biodiversity Network (NBN). In the Netherlands, data are now made available from the National Database of Flora and Fauna (NDFF), which currently contains approximately 40 million observation records covering a broad variety of species. The NDFF uses a standardised, semantically integrated data model to combine effectively species observation data of various kinds. In this paper, we evaluate this approach and the NDFF data model, by comparison with Darwin Core, Access to Biological Collections Data (ABCD) and the Recorder 2000 model used by the NBN. We conclude that the high degree of standardisation in the NDFF data model has led to somewhat increased cost in data conversion, but also to improved semantic integration and ease-of-use of species observation data. Together with the relative simplicity, completeness and flexibility of the model, this enables effective reuse of species observations in a user-friendly manner.

[1]  Helena Karasti,et al.  Digital Data Practices and the Long Term Ecological Research Program Growing Global , 2008, Int. J. Digit. Curation.

[2]  Geoffrey C. Bowker Biodiversity Datadiversity , 2000 .

[3]  L. E. Veen,et al.  The NDFF-EcoGRID Logical Data Model Version 3 , 2011 .

[4]  Shawn Bowers,et al.  The New Bioinformatics: Integrating Ecological Data from the Gene to the Biosphere , 2006 .

[5]  S. Sarkar,et al.  Systematic conservation planning , 2000, Nature.

[6]  David R. Anderson,et al.  Distance Sampling: Estimating Abundance of Biological Populations , 1995 .

[7]  Nicholas Chrisman,et al.  Rethinking Levels of Measurement for Cartography , 1998 .

[8]  Shawn Bowers,et al.  Advancing ecological research with ontologies. , 2008, Trends in ecology & evolution.

[9]  S S Stevens,et al.  On the Theory of Scales of Measurement. , 1946, Science.

[10]  O. Phillips,et al.  Extinction risk from climate change , 2004, Nature.

[11]  William K. Michener,et al.  NONGEOSPATIAL METADATA FOR THE ECOLOGICAL SCIENCES , 1997 .

[12]  Anna Lawrence,et al.  Personal meaning in the public sphere: the standardisation and rationalisation of biodiversity data in the UK and the Netherlands. , 2010 .

[13]  Matthew B. Jones,et al.  Advances in environmental information management , 2010, Ecol. Informatics.

[14]  J. Silvertown A new dawn for citizen science. , 2009, Trends in ecology & evolution.

[15]  Matthew B. Jones,et al.  Managing heterogeneous ecological data using Morpho , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[16]  R. B. Jackson,et al.  Global biodiversity scenarios for the year 2100. , 2000, Science.

[17]  J. Sarmiento,et al.  Projecting global marine biodiversity impacts under climate change scenarios , 2009 .

[18]  Matthew B. Jones,et al.  Managing Scientific Metadata , 2001, IEEE Internet Comput..

[19]  Natalya F. Noy,et al.  Semantic integration: a survey of ontology-based approaches , 2004, SGMD.

[20]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[21]  Matthew Jones,et al.  Maximizing the Value of Ecological Data with Structured Metadata: An Introduction to Ecological Metadata Language (EML) and Principles for Metadata Creation , 2005 .

[22]  M. Willig,et al.  Understanding Environmental Complexity through a Distributed Knowledge Network , 2004 .

[23]  Honglin He,et al.  BUILDING AN INFORMATION MANAGEMENT SYSTEM FOR GLOBAL DATA SHARING : A STRATEGY FOR THE INTERNATIONAL LONG TERM ECOLOGICAL RESEARCH ( ILTER ) NETWORK , 2009 .

[24]  R. Mittermeier,et al.  Biodiversity hotspots for conservation priorities , 2000, Nature.

[25]  Jérôme Euzenat,et al.  Ten Challenges for Ontology Matching , 2008, OTM Conferences.

[26]  James H. Brown,et al.  Ecology for a Changing Earth , 1990, The Bulletin of the Ecological Society of America.

[27]  Ann Zimmerman,et al.  Not by metadata alone: the use of diverse forms of knowledge to locate data for reuse , 2007, International Journal on Digital Libraries.

[28]  J. L. Rosenberger,et al.  Defining and Unraveling Biocomplexity , 2001 .

[29]  Robert P Guralnick,et al.  Towards a collaborative, global infrastructure for biodiversity assessment , 2007, Ecology letters.

[30]  Sandra Jones,et al.  Ecological Census Techniques , 2008 .

[31]  S. T. Buckland,et al.  Long-term datasets in biodiversity research and monitoring: assessing change in ecological communities through time. , 2010, Trends in ecology & evolution.

[32]  Bertram Ludäscher,et al.  Model-based mediation with domain maps , 2001, Proceedings 17th International Conference on Data Engineering.

[33]  Matthew B. Jones,et al.  A semantic annotation framework for retrieving and analyzing observational datasets , 2010, ESAIR '10.

[34]  Jeffery S. Horsburgh,et al.  An integrated system for publishing environmental observations data , 2009, Environ. Model. Softw..

[35]  Vipul Kashyap,et al.  Semantic heterogeneity in global information systems: The role of metadata , 1996 .

[36]  Graeme G. Shanks,et al.  What Makes a Good Data Model? Evaluating the Quality of Entity Relationship Models , 1994, ER.

[37]  Anne Hale Miglarese,et al.  Managing Troubled Data: Coastal Data Partnerships Smooth Data Integration , 2003, Environmental monitoring and assessment.

[38]  A. Peterson Uses and requirements of ecological niche models and related distributional models , 2006 .

[39]  A. Barendregt,et al.  Defining hotspots of characteristic species for multiple taxonomic groups in the Netherlands , 2010, Biodiversity and Conservation.

[40]  J. Gosz,et al.  Ecology Challenged? Who? Why? Where is This Headed? , 1999, Ecosystems.

[41]  William K. Michener,et al.  Meta-information concepts for ecological data management , 2006, Ecol. Informatics.