A metadata-driven approach to loading and querying heterogeneous scientific data

Abstract The Ecological Metadata Language is an effective specification for describing data for long-term storage and interpretation. When used in conjunction with a metadata repository such as Metacat, and a metadata editing tool such as Morpho, the Ecological Metadata Language allows a large community of researchers to access and to share their data. Although the Ecological Metadata Language/Morpho/Metacat toolkit provides a rich data documentation mechanism, current methods for retrieving metadata-described data can be laborious and time consuming. Moreover, the structural and semantic heterogeneity of ecological data sets makes the development of custom solutions for integrating and querying these data prohibitively costly for large-scale synthesis. The Data Manager Library leverages the Ecological Metadata Language to provide automated data processing features that allow efficient data access, querying, and manipulation without custom development. The library can be used for many data management tasks and was designed to be immediately useful as well as extensible and easy to incorporate within existing applications. In this paper we describe the motivation for developing the Data Manager Library, provide an overview of its implementation, illustrate ideas for potential use by describing several planned and existing deployments, and describe future work to extend the library.

[1]  E. Barbier,et al.  Response to Comments on "Impacts of Biodiversity Loss on Ocean Ecosystem Services" , 2007, Science.

[2]  Matthew B. Jones,et al.  Managing heterogeneous ecological data using Morpho , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[3]  Inigo San Gil,et al.  Pasta: A Network-level Architecture Design for Automating the Creation of Synthetic Products in the LTER Network , 2006 .

[4]  Matthew Jones,et al.  Maximizing the Value of Ecological Data with Structured Metadata: An Introduction to Ecological Metadata Language (EML) and Principles for Metadata Creation , 2005 .

[5]  M. Willig,et al.  Understanding Environmental Complexity through a Distributed Knowledge Network , 2004 .

[6]  Shawn Bowers,et al.  The New Bioinformatics: Integrating Ecological Data from the Gene to the Biosphere , 2006 .

[7]  M. P. Cummings,et al.  Data sharing in ecology and evolution. , 2005, Trends in ecology & evolution.

[8]  Matthew B. Jones,et al.  Managing Scientific Metadata , 2001, IEEE Internet Comput..

[9]  Shawn Bowers,et al.  An ontology for describing and synthesizing ecological observation data , 2007, Ecol. Informatics.

[10]  R. O'Neill,et al.  The value of the world's ecosystem services and natural capital , 1997, Nature.

[11]  Peter Buneman Scientific and Statistical Database Management, 2004. Proceedings. 16th International Conference on , 2004 .

[12]  Bertram Ludäscher,et al.  Kepler: an extensible system for design and execution of scientific workflows , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[13]  K. Bjorndal,et al.  Historical Overfishing and the Recent Collapse of Coastal Ecosystems , 2001, Science.

[14]  F. Ayala,et al.  Complexity in Ecology and Conservation: Mathematical, Statistical, and Computational Challenges , 2005 .

[15]  Shawn Bowers,et al.  Advancing ecological research with ontologies. , 2008, Trends in ecology & evolution.