PDiffView: Viewing the Difference in Provenance of Workflow Results

Scientific workflow systems are becoming increasingly important for managing in-silico experiments. Such experiments are typically specified as directed flow graphs, in which the nodes represent modules and edges represent data flow between the modules. Each execution (a.k.a. run) of an experiment may vary the parameters and data inputs to the modules in the specification; furthermore, alternative paths of the workflow may be followed. In this process, the scientist's goal is to identify parameter settings and approaches which lead to good final results. Comparing workflow executions of the same specification and understanding the difference between them is thus of paramount importance for understanding the provenance of final results [4].

[1]  Juliana Freire,et al.  Provenance and scientific workflows: challenges and opportunities , 2008, SIGMOD Conference.

[2]  Eugene L. Lawler,et al.  The recognition of Series Parallel digraphs , 1979, SIAM J. Comput..

[3]  Sanjeev Khanna,et al.  Differencing Provenance in Scientific Workflows , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[4]  James Cheney,et al.  Curated databases , 2008, PODS.