Linked provenance data: A semantic Web-based approach to interoperable workflow traces

The Third Provenance Challenge (PC3) offered an opportunity for provenance researchers to evaluate the interoperability of leading provenance models with special emphasis on importing and querying workflow traces generated by others. We investigated interoperability issues related to reusing Open Provenance Model (OPM)-based workflow traces. We compiled data about interoperability issues that were observed during PC3 and use that data to help describe and motivate solution paths for two outstanding interoperability issues in OPM-based provenance data reuse: (i) a provenance trace often requires both generic provenance data and domain-specific data to support future reuse (such as querying); (ii) diverse provenance traces (possibly from different sources) often require preservation and interconnection to support future aggregation and comparison. In order to address these issues and to facilitate interoperable reuse, integration, and alignment of provenance data, we propose a Semantic Web-based approach known as Linked Provenance Data, where: (i) the Web Ontology Language (OWL) can be used to support complex domain concept modeling, such as subtype taxonomy and concept alignment, and seamlessly connect domain extensions to OPM core concepts; (ii) Linked Data can enable open and transparent infrastructure for provenance data reuse.

[1]  Carole A. Goble,et al.  Using Semantic Web Technologies for Representing E-science Provenance , 2004, SEMWEB.

[2]  Deborah L. McGuinness,et al.  When owl: sameAs Isn't the Same: An Analysis of Identity in Linked Data , 2010, SEMWEB.

[3]  Deborah L. McGuinness,et al.  SameAs Networks and Beyond: Analyzing Deployment Status and Implications of owl: sameAs in Linked Data , 2010, International Semantic Web Conference.

[4]  Christian Bizer,et al.  Executing SPARQL Queries over the Web of Linked Data , 2009, SEMWEB.

[5]  Johan Tordsson,et al.  Three fundamental dimensions of scientific workflow interoperability: Model of computation, language, and execution environment , 2010, Future Gener. Comput. Syst..

[6]  James A. Hendler,et al.  A Semantic Web approach to the provenance challenge , 2008, Concurr. Comput. Pract. Exp..

[7]  Paul T. Groth,et al.  The Requirements of Using Provenance in e-Science Experiments , 2007, Journal of Grid Computing.

[8]  Deborah L. McGuinness,et al.  PML 2: A Modular Explanation Interlingua , 2007, ExaCt.

[9]  Deborah L. McGuinness,et al.  Explaining answers from the Semantic Web: the Inference Web approach , 2004, J. Web Semant..

[10]  Carole A. Goble,et al.  Mining Taverna's semantic web of provenance , 2008, Concurr. Comput. Pract. Exp..

[11]  Ulf Leser,et al.  Querying Distributed RDF Data Sources with SPARQL , 2008, ESWC.

[12]  Carole A. Goble,et al.  Workflows to open provenance graphs, round-trip , 2011, Future Gener. Comput. Syst..

[13]  Jennifer Widom,et al.  Tracing the lineage of view data in a warehousing environment , 2000, TODS.

[14]  Yolanda Gil,et al.  Provenance trails in the Wings-Pegasus system , 2008 .

[15]  Margo I. Seltzer,et al.  Provenance-Aware Storage Systems , 2006, USENIX ATC, General Track.

[16]  Deborah L. McGuinness,et al.  Inference Web in Action: Lightweight Use of the Proof Markup Language , 2008, SEMWEB.

[17]  Shiyong Lu,et al.  Storing and Querying Scientific Workflow Provenance Metadata Using an RDBMS , 2007, Third IEEE International Conference on e-Science and Grid Computing (e-Science 2007).

[18]  V. Vianu,et al.  Edinburgh Why and Where: A Characterization of Data Provenance , 2017 .

[19]  Simon Schenk,et al.  A SPARQL Semantics Based on Datalog , 2007, KI.

[20]  Yogesh L. Simmhan,et al.  The Open Provenance Model (v1.01) , 2008 .

[21]  Wang Chiew Tan,et al.  Research Problems in Data Provenance , 2004, IEEE Data Eng. Bull..

[22]  Yogesh L. Simmhan,et al.  The Open Provenance Model core specification (v1.1) , 2011, Future Gener. Comput. Syst..

[23]  Olaf Hartig Provenance Information in the Web of Data , 2009, LDOW.

[24]  Denis A. Nicole,et al.  Named Graphs as a Mechanism for Reasoning About Provenance , 2006, APWeb.

[25]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[26]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[27]  Tim Berners-Lee,et al.  Linked data , 2020, Semantic Web for the Working Ontologist.

[28]  Deborah L. McGuinness,et al.  OWL Web ontology language overview , 2004 .

[29]  Paul T. Groth,et al.  A model of process documentation to determine provenance in mash-ups , 2009, TOIT.

[30]  David M. Shotton,et al.  Linked data and provenance in biological data webs , 2009, Briefings Bioinform..

[31]  Babak Esfandiari,et al.  Proceedings of the WWW2009 Workshop on Linked Data on the Web, LDOW 2009, Madrid, Spain, April 20, 2009 , 2009, LDOW.