Experiences with DERIVA: An Asset Management Platform for Accelerating eScience

The pace of discovery in eScience is increasingly dependent on a scientist’s ability to acquire, curate, integrate, analyze, and share large and diverse collections of data. It is all too common for investigators to spend inordinate amounts of time developing ad hoc procedures to manage their data. In previous work, we presented DERIVA, a Scientific Asset Management System, designed to accelerate data driven discovery. In this paper, we report on the use of DERIVA in a number of substantial and diverse eScience applications. We describe the lessons we have learned, both from the perspective of the DERIVA technology, as well as the ability and willingness of scientists to incorporate Scientific Asset Management into their daily workflows.

[1]  MacKenzie Smith,et al.  DSpace: An Open Source Dynamic Digital Repository , 2003, D Lib Mag..

[2]  C. Begley,et al.  Reproducibility: Six red flags for suspect work , 2013, Nature.

[3]  Karl Czajkowski,et al.  Accelerating data-driven discovery with scientific asset management , 2016, 2016 IEEE 12th International Conference on e-Science (e-Science).

[4]  Chris Mungall,et al.  A Chado case study: an ontology-based modular schema for representing genome-associated biological information , 2007, ISMB/ECCB.

[5]  Christine L. Borgman,et al.  The conundrum of sharing research data , 2012, J. Assoc. Inf. Sci. Technol..

[6]  Begley Cg,et al.  Ocean science: Arctic sea ice needs better forecasts , 2013, Nature.

[7]  Ian T. Foster,et al.  Globus Data Publication as a Service: Lowering Barriers to Reproducible Science , 2015, 2015 IEEE 11th International Conference on e-Science.

[8]  Carl Kesselman,et al.  Grid-based metadata services , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[9]  Reagan Moore,et al.  iRODS Primer: Integrated Rule-Oriented Data System , 2010, iRODS Primer.

[10]  Norman W. Paton,et al.  The design and implementation of Grid database services in OGSA‐DAI , 2005, Concurr. Pract. Exp..

[11]  Jeffrey Heer,et al.  Enterprise Data Analysis and Visualization: An Interview Study , 2012, IEEE Transactions on Visualization and Computer Graphics.

[12]  Gert Vriend,et al.  GPCRDB information system for G protein-coupled receptors , 2003, Nucleic Acids Res..

[13]  W. J. Kent,et al.  The UCSC Genome Browser , 2003, Current protocols in bioinformatics.

[14]  Arthur W. Toga,et al.  I'll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[15]  Malcolm Atkinson,et al.  Proceedings of the 16th International Conference on Scientific and Statistical Database Management (SSDBM’04) , 2004 .

[16]  Ian T. Foster,et al.  Globus Nexus: An identity, profile, and group management platform for science gateways and other collaborative science applications , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[17]  Anne E. Trefethen,et al.  Toward interoperable bioscience data , 2012, Nature Genetics.

[18]  Yolanda Gil,et al.  Artemis: Integrating Scientific Data on the Grid , 2004, AAAI.

[19]  Bas Vroling,et al.  GPCRdb: an information system for G protein-coupled receptors , 2015, Nucleic Acids Res..

[20]  Nuno Santos,et al.  The AMGA Metadata Service , 2008, Journal of Grid Computing.

[21]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[22]  Gabrielle Allen,et al.  Semantic enabled metadata management in PetaShare , 2009, Int. J. Grid Util. Comput..

[23]  Carole A. Goble,et al.  Accelerating Scientists' Knowledge Turns , 2011, KDIR.