A navigation model for exploring scientific workflow provenance graphs

Many scientific workflow systems record provenance information in the form of data and process dependencies as part of workflow execution. Users often wish to explore these dependencies to reproduce, validate, and explain workflow results, e.g., by examining the data and processes that were used to produce particular workflow outputs. A natural interface for determining relevant provenance information, which is adopted by many systems, is to display the complete provenance dependency graph. However, for many workflows, provenance graphs can be large, with thousands or more nodes and edges. Displaying an entire provenance graph for such workflows can result in "provenance overload," where the large amount of provenance information available makes it difficult for users to find relevant information and explore data and process dependencies. In this paper, we address the challenges of "provenance overload" through a novel navigation model that provides operations for creating different views of provenance graphs along with approaches for easily navigating between different views. Further, our proposed navigation model provides an integrated approach for exploring, summarizing, and querying portions of provenance graphs. We also discuss different architectures for efficiently navigating large provenance graphs against an underlying provenance database.

[1]  Carole A. Goble,et al.  Mining Taverna's semantic web of provenance , 2008, Concurr. Comput. Pract. Exp..

[2]  Juliana Freire,et al.  Provenance and scientific workflows: challenges and opportunities , 2008, SIGMOD Conference.

[3]  Bertram Ludäscher,et al.  Efficient provenance storage over nested data collections , 2009, EDBT '09.

[4]  Yannis Papakonstantinou,et al.  BBQ: A Visual Interface for Integrated Browsing and Querying of XML , 2000, VDB.

[5]  Bertram Ludäscher,et al.  Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life , 2008, IPAW.

[6]  Edward A. Lee,et al.  CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2000; 00:1–7 Prepared using cpeauth.cls [Version: 2002/09/19 v2.02] Taverna: Lessons in creating , 2022 .

[7]  Thomas Heinis,et al.  Efficient lineage tracking for scientific workflows , 2008, SIGMOD Conference.

[8]  Adriane Chapman,et al.  Efficient provenance storage , 2008, SIGMOD Conference.

[9]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[10]  Bertram Ludäscher,et al.  Scientific workflow management and the Kepler system: Research Articles , 2006 .

[11]  Cláudio T. Silva,et al.  Tackling the Provenance Challenge one layer at a time , 2008 .

[12]  Cláudio T. Silva,et al.  Provenance for Visualizations: Reproducibility and Beyond , 2007, Computing in Science & Engineering.

[13]  Cláudio T. Silva,et al.  VisTrails: enabling interactive multiple-view visualizations , 2005, VIS 05. IEEE Visualization, 2005..

[14]  Roger S. Barga,et al.  Automatic capture and efficient storage of e‐Science experiment provenance , 2008, Concurr. Comput. Pract. Exp..

[15]  Bertram Ludäscher,et al.  Exploring Scientific Workflow Provenance Using Hybrid Queries over Nested Data and Lineage Graphs , 2009, SSDBM.

[16]  Aleksander Slominski Adapting BPEL to Scientific Workflows , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[17]  Bertram Ludäscher,et al.  A Model for User-Oriented Data Provenance in Pipelined Scientific Workflows , 2006, IPAW.

[18]  Yong Zhao,et al.  A Logic Programming Approach to Scientific Workflow Provenance Querying , 2008, IPAW.

[19]  Carole A. Goble,et al.  Data Lineage Model for Taverna Workflows with Lightweight Annotation Requirements , 2008, IPAW.

[20]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[21]  Bertram Ludäscher,et al.  Scientific workflow design for mere mortals , 2009, Future Gener. Comput. Syst..

[22]  Laura M. Haas,et al.  PESTO : An Integrated Query/Browser for Object Databases , 1996, VLDB.

[23]  Ian Foster,et al.  The First Provenance Challenge , 2008 .

[24]  Carmem S. Hara,et al.  Querying and Managing Provenance through User Views in Scientific Workflows , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[25]  Margo I. Seltzer,et al.  Layering in Provenance Systems , 2009, USENIX Annual Technical Conference.

[26]  Omer F. Rana,et al.  Navigating Provenance Information for Distributed Healthcare Management , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[27]  Jane Hunter,et al.  Provenance Explorer-a graphical interface for constructing scientific publication packages from provenance trails , 2007, International Journal on Digital Libraries.

[28]  Luc Moreau,et al.  The Open Provenance Model , 2007 .

[29]  Margo I. Seltzer,et al.  Choosing a Data Model and Query Language for Provenance , 2008, IPAW 2008.

[30]  Susan B. Davidson,et al.  Towards a Model of Provenance and User Views in Scientific Workflows , 2006, DILS.

[31]  Cláudio T. Silva,et al.  VisTrails: visualization meets data management , 2006, SIGMOD Conference.