Provenance Explorer-a graphical interface for constructing scientific publication packages from provenance trails

Scientific communities are under increasing pressure from funding organizations to publish their raw data, in addition to their traditional publications, in open archives. Many scientists would be willing to do this if they had tools that streamlined the process and exposed simple provenance information, i.e., enough to explain the methodology and validate the results without compromising the author’s intellectual property or competitive advantage. This paper presents Provenance Explorer, a tool that enables the provenance trail associated with a scientific discovery process to be visualized and explored through a graphical user interface (GUI). Based on RDF graphs, it displays the sequence of data, states and events associated with a scientific workflow, illustrating the methodology that led to the published results. The GUI also allows permitted users to expand selected links between nodes to reveal more fine-grained information and sub-workflows. But more importantly, the system enables scientists to selectively construct “scientific publication packages” by choosing particular nodes from the visual provenance trail and dragging-and-dropping them into an RDF package which can be uploaded to an archive or repository for publication or e-learning. The provenance relationships between the individual components in the package are automatically inferred using a rules-based inferencing engine.

[1]  Benjamin Kuipers,et al.  Algernon—a tractable system for knowledge-representation , 1991, SGAR.

[2]  Ron Weber,et al.  Ontological foundations of information systems , 1997 .

[3]  Jane Hunter,et al.  An overview of the MPEG-7 Description Definition Language (DDL) proposals , 2000, Signal Process. Image Commun..

[4]  Jane Hunter,et al.  The ABC Ontology and Model , 2001, J. Digit. Inf..

[5]  Nicola Guarino,et al.  Sweetening Ontologies with DOLCE , 2002, EKAW.

[6]  Karen Schuchardt,et al.  Multi-scale Science: Supporting Emerging Practice with Semantically Derived Provenance , 2003 .

[7]  Herbert Van de Sompel,et al.  Using MPEG-21 DIDL to Represent Complex Digital Objects in the Los Alamos National Laboratory Digital Library , 2003, D Lib Mag..

[8]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[9]  Ian J. Taylor,et al.  Triana: a graphical Web service composition and execution toolkit , 2004, Proceedings. IEEE International Conference on Web Services, 2004..

[10]  Monica M. C. Schraefel,et al.  Breaking the book: translating the chemistry lab book into a pervasive computing lab environment , 2004, CHI.

[11]  James Frew,et al.  Composing lineage metadata with XML for custom satellite-derived data products , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[12]  Carole A. Goble,et al.  Using Semantic Web Technologies for Representing E-science Provenance , 2004, SEMWEB.

[13]  Jane Hunter,et al.  Realizing the hydrogen economy through Semantic Web technologies , 2004, IEEE Intelligent Systems.

[14]  J. Carroll,et al.  Jena: implementing the semantic web recommendations , 2004, WWW Alt. '04.

[15]  Sandra Payette,et al.  Fedora: an architecture for complex objects and their relationships , 2005, International Journal on Digital Libraries.

[16]  Jerry Martin,et al.  GridNexus: A Grid Services Scientific Workflow System , 2005 .

[17]  Herbert Van de Sompel,et al.  aDORe: a modular, standards-based Digital Object Repository , 2005, Comput. J..

[18]  Samson W. Tu,et al.  Writing Rules for the Semantic Web Using SWRL and Jess , 2005 .

[19]  Jane Hunter,et al.  Generating eScience Workflows from Statistical Analysis of Prior Data , 2005 .

[20]  Robert Stevens,et al.  myTea: Connecting the Web to Digital Science on the Desktop , 2005 .

[21]  J. Houghton,et al.  Digital Broadband Content: Scientific Publishing , 2005 .

[22]  Ilkay Altintas,et al.  Provenance Collection Support in the Kepler Scientific Workflow System , 2006, IPAW.

[23]  Robert M. Colomb,et al.  Formal Versus Material Ontologies for Information Systems Interoperation in the Semantic Web , 2002, Comput. J..

[24]  Cláudio T. Silva,et al.  Managing Rapidly-Evolving Scientific Workflows , 2006, IPAW.