OPQL: A First OPM-Level Query Language for Scientific Workflow Provenance

Provenance, which is one kind of metadata that captures the derivation history of a data product, including its original data sources, intermediate products, and the steps that were applied to produce it, has become increasingly important in services computing and scientific workflows to validate, interpret, and analyze the result of scientific computing. Most existing systems store provenance data captured into their own provenance storages of proprietary provenance models and conduct query processing over the physical provenance storages using query languages, such as SQL, SPARQL, and Query, which are closely coupled to the underlying provenance storage strategies. In this paper, we present OPQL, an OPM-level provenance query language, that is directly defined over the Open Provenance Model (OPM). An OPQL query takes an OPM graph as input and produces an OPM graph as output. Therefore, OPQL queries are not tightly coupled to the underlying provenance storage strategies. Our main contributions are: (i) we design OPQL, including graph patterns and an OPM-based graph algebra for OPQL, that efficiently supports provenance lineage queries, (ii) we implement OPQ Lin our OPMPROV system, where the result of OPQL queries is displayed as an OPM graph via the OPMPROV browser. An experimental study is conducted to evaluate the performance and feasibility of OPQL for provenance querying. To our best knowledge, OPQL is the first OPM-level query language for scientific workflow provenance.

[1]  Ambuj K. Singh,et al.  Graphs-at-a-time: query language and access methods for graph databases , 2008, SIGMOD Conference.

[2]  Shiyong Lu,et al.  Secure abstraction views for scientific workflow provenance querying , 2010, IEEE Transactions on Services Computing.

[3]  Shiyong Lu,et al.  Storing, reasoning, and querying OPM-compliant scientific workflow provenance using relational databases , 2011, Future Gener. Comput. Syst..

[4]  Yogesh L. Simmhan,et al.  A Framework for Collecting Provenance in Data-Centric Scientific Workflows , 2006, 2006 IEEE International Conference on Web Services (ICWS'06).

[5]  Yogesh L. Simmhan,et al.  Karma2: Provenance Management for Data-Driven Workflows , 2008, Int. J. Web Serv. Res..

[6]  Schahram Dustdar,et al.  Service Provenance in QoS-Aware Web Service Runtimes , 2009, 2009 IEEE International Conference on Web Services.

[7]  Paul T. Groth,et al.  The Requirements of Using Provenance in e-Science Experiments , 2007, Journal of Grid Computing.

[8]  Yanbo Han,et al.  ViPen: A Model Supporting Knowledge Provenance for Exploratory Service Composition , 2010, 2010 IEEE International Conference on Services Computing.

[9]  Cláudio T. Silva,et al.  Tackling the Provenance Challenge one layer at a time , 2008 .

[10]  Yogesh L. Simmhan,et al.  The Open Provenance Model core specification (v1.1) , 2011, Future Gener. Comput. Syst..

[11]  Bertram Ludäscher,et al.  Techniques for efficiently querying scientific workflow provenance graphs , 2010, EDBT '10.

[12]  Carole A. Goble,et al.  Mining Taverna's semantic web of provenance , 2008, Concurr. Comput. Pract. Exp..

[13]  W. Marsden I and J , 2012 .

[14]  Carmem S. Hara,et al.  Querying and Managing Provenance through User Views in Scientific Workflows , 2008, 2008 IEEE 24th International Conference on Data Engineering.