A Semantic Web approach to the provenance challenge

Provenance is critically important for scientific workflow systems, as it allows users to verify data, repeat experiments, and discover dependencies. The Semantic Web is a natural fit for representing provenance, as it contains explicit support for representing and inferring connections between data and processes, as well as for adding annotations to data. In this article, we present a Semantic Web approach to the Provenance Challenge (Concurrency Computat.: Pract. Exper. 2007; DOI: 10.1002-cpe.1233). We use web services, ontologies, OWL reasoners, triple stores, and the SPARQL query language to implement the workflow, represent the data and the connections within it, and execute queries. We successfully implemented and answered all of the challenge queries. The flexibility of the Semantic Web also makes it quite easy to convert different provenance systems' data representation to a form we can work with. We illustrate this by integrating data from the PASS approach into our system, and successfully executing all of the challenge queries on it as well. Copyright © 2007 John Wiley & Sons, Ltd.