A Logic Programming Approach to Scientific Workflow Provenance Querying

Scientific workflows have become increasingly important for enabling and accelerating many scientific discoveries. More and more scientists and researchers rely on workflow systems to integrate and structure various local and remote heterogeneous data and services to perform in silico experiments. In order to support understanding, validation, and reproduction of scientific results, provenance querying and management has become a critical component in scientific workflows. In this paper, we propose a logic programming approach to scientific workflow provenance querying and management with the following contributions: i) We identify a set of characteristics that are desirable for a scientific workflow provenance query language; ii) Based on these requirements, we propose FLOQ, a Frame Logic based query language for scientific workflow provenance, iii) We demonstrate that our previous relational database based provenance model, virtual data schema , can be easily mapped to the FLOQ model; and iv) We show by examples that FLOQ is expressive enough to formulate common provenance queries, including all the provenance challenge queries proposed in the provenance challenge series.

[1]  Michael Kifer,et al.  Flora-2: A Rule-Based Knowledge Representation and Inference Infrastructure for the Semantic Web , 2003, OTM.

[2]  Luc Moreau,et al.  Report on the International Provenance and Annotation Workshop: (IPAW'06) 3-5 May 2006, Chicago , 2006, SGMD.

[3]  Bertram Ludäscher,et al.  Provenance in Scientific Workflow Systems , 2007, IEEE Data Eng. Bull..

[4]  Paul T. Groth,et al.  An Architecture for Provenance Systems , 2006 .

[5]  Michael Kifer,et al.  HILOG: A Foundation for Higher-Order Logic Programming , 1993, J. Log. Program..

[6]  C. R. Ramakrishnan,et al.  Logic based modeling and analysis of workflows , 1998, PODS '98.

[7]  Wolfgang May,et al.  Combining OWL with F-Logic Rules and Defaults , 2007, ALPSWS.

[8]  Ian Foster,et al.  The First Provenance Challenge , 2008 .

[9]  Giorgio Terracina,et al.  Experimenting with recursive queries in database and logic programming systems , 2007, Theory and Practice of Logic Programming.

[10]  Ilkay Altintas,et al.  Provenance Collection Support in the Kepler Scientific Workflow System , 2006, IPAW.

[11]  Carole A. Goble,et al.  Using provenance to manage knowledge of In Silico experiments , 2007, Briefings Bioinform..

[12]  Ricardo Rocha,et al.  Relational Models for Tabling Logic Programs in a Database , 2007, INAP/WLP.

[13]  Gregor von Laszewski,et al.  Swift: Fast, Reliable, Loosely Coupled Parallel Computation , 2007, 2007 IEEE Congress on Services (Services 2007).

[14]  Dave Thomas,et al.  ECOOP 2006 - Object-Oriented Programming , 2006 .

[15]  Elnar Hajiyev,et al.  codeQuest: Scalable Source Code Queries with Datalog , 2006, ECOOP.

[16]  FosterIan,et al.  Report on the International Provenance and Annotation Workshop , 2006 .

[17]  Carmem S. Hara,et al.  Querying and Managing Provenance through User Views in Scientific Workflows , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[18]  Yong Zhao,et al.  XDTM: The XML Data Type and Mapping for Specifying Datasets , 2005, EGC.

[19]  Yong Zhao,et al.  Tracking provenance in a virtual data grid , 2008, Concurr. Comput. Pract. Exp..

[20]  Michael Kifer,et al.  Logical foundations of object-oriented and frame-based languages , 1995, JACM.

[21]  Yong Zhao,et al.  Applying the Virtual Data Provenance Model , 2006, IPAW.

[22]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[23]  Simon Miles Electronically Querying for the Provenance of Entities , 2006, IPAW.

[24]  Yogesh L. Simmhan,et al.  Performance Evaluation of the Karma Provenance Framework for Scientific Workflows , 2006, IPAW.

[25]  Juliana Freire,et al.  XSB: A System for Effciently Computing WFS , 1997, LPNMR.

[26]  Yong Zhao,et al.  Chimera: a virtual data system for representing, querying, and automating data derivation , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[27]  D. Warren,et al.  Xsb -a System for Eeciently Computing Well Founded Semantics , 1997 .

[28]  Cláudio T. Silva,et al.  Managing Rapidly-Evolving Scientific Workflows , 2006, IPAW.