An Identity Crisis in the Life Sciences

myGrid is an e-Science project assisting life scientists to build workflows that gather data from distributed, autonomous, replicated and heterogeneous resources. The provenance logs of workflow executions are recorded as RDF graphs. The log of one workflow run is used to trace the history of its execution process. However, by aggregating provenance logs of many workflow runs, one may gather the provenance of a common data product shared in multiple derivation paths. A successful aggregation relies on accurate and universal identification of each data product. The nature of bioinformatics data and services, however, makes this difficult. We describe the identity problem in bioinformatics data, and present a protocol for managing identity co-references and allocating identity to gathered and computed data products. The ability to overcome this problem means that the provenance of workflows in bioinformatics and other domains can be exploited to enhance the practice of e-Science.

[1]  Michael Luck,et al.  A Protocol for Recording Provenance in Service-Oriented Grids , 2004, OPODIS.

[2]  Carole A. Goble,et al.  Exploring Williams-Beuren syndrome using myGrid , 2004, ISMB/ECCB.

[3]  Cláudio T. Silva,et al.  VisTrails: enabling interactive multiple-view visualizations , 2005, VIS 05. IEEE Visualization, 2005..

[4]  Jeffrey M. Bradshaw,et al.  Applying KAoS Services to Ensure Policy Compliance for Semantic Web Services Workflow Composition and Enactment , 2004, SEMWEB.

[5]  Ian T. Foster,et al.  The virtual data grid: a new model and architecture for data-intensive collaboration , 2003, 15th International Conference on Scientific and Statistical Database Management, 2003..

[6]  Edward A. Lee,et al.  CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2000; 00:1–7 Prepared using cpeauth.cls [Version: 2002/09/19 v2.02] Taverna: Lessons in creating , 2022 .

[7]  Sean Martin,et al.  The impact of Life Science Identifier on informatics data. , 2005, Drug discovery today.

[8]  Jeremy J. Carroll,et al.  Modelling Context using Named Graphs , 2004 .

[9]  Donna R. Maglott,et al.  RefSeq and LocusLink: NCBI gene-centered resources , 2001, Nucleic Acids Res..

[10]  Joe Futrelle,et al.  Harvesting RDF Triples , 2006, IPAW.

[11]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[12]  Hugh Glaser,et al.  A Framework for Reference Management in the Semantic Web , 2005 .

[13]  Michael Hucka,et al.  Escalating model sizes and complexities call for standardized forms of representation , 2005, Molecular systems biology.

[14]  Carole A. Goble,et al.  Using Semantic Web Technologies for Representing E-science Provenance , 2004, SEMWEB.

[15]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[16]  Jeremy J. Carroll,et al.  Named graphs , 2005, J. Web Semant..

[17]  Robert Stevens,et al.  Association of variations in I kappa B-epsilon with Graves’ disease using classical and myGrid methodologies , 2004 .

[18]  Sean Martin,et al.  Globally distributed object identification for biological knowledgebases , 2004, Briefings Bioinform..

[19]  Simon Miles Electronically Querying for the Provenance of Entities , 2006, IPAW.

[20]  Michael Y. Galperin The Molecular Biology Database Collection: 2006 update , 2005, Nucleic Acids Res..

[21]  Robin Fegeas,et al.  Issues and Prospects for the Next Generation of the Spatial Data Transfer Standard (SDTS) , 1998, Int. J. Geogr. Inf. Sci..

[22]  Robert Wilensky,et al.  A framework for distributed digital object services , 2006, International Journal on Digital Libraries.

[23]  Michael Y. Galperin The Molecular Biology Database Collection: 2005 update , 2004, Nucleic Acids Res..

[24]  Alan Ruttenberg,et al.  Experience Using OWL DL for the Exchange of Biological Pathway Information , 2005, OWLED.