Structural summaries for visual provenance analysis

Many systems today exist to collect provenance that describes how some data was derived. Such provenance represents useful information in many use cases, e.g., reproducibility or derivation process quality. Depending on the use case, collected provenance traces need to be explored and analyzed. Therefore, various approaches, including visual analysis approaches have been proposed. However, these typically focus on analyzing individual provenance traces. We propose to create structure-based summaries of provenance by aggregating many provenance traces provided in W3C-PROV representation. We further describe the analysis tasks that apply on these summaries. We showcase the usefulness of structural summaries based on several use cases, when using appropriate visualization and interaction techniques.

[1]  Gustavo Alonso,et al.  The perm provenance management system in action , 2009, SIGMOD Conference.

[2]  Susan B. Davidson,et al.  Zoom*UserViews: Querying Relevant Provenance in Workflow Systems , 2007, VLDB.

[3]  Yolanda Gil,et al.  PROV-DM: The PROV Data Model , 2013 .

[4]  Florence March,et al.  2016 , 2016, Affair of the Heart.

[5]  Rajasekar Krishnamurthy,et al.  HIL: a high-level scripting language for entity integration , 2013, EDBT '13.

[6]  Cláudio T. Silva,et al.  VisTrails: visualization meets data management , 2006, SIGMOD Conference.

[7]  Cláudio T. Silva,et al.  Visual summaries for graph collections , 2013, 2013 IEEE Pacific Visualization Symposium (PacificVis).

[8]  Daniel Deutch,et al.  Approximated Summarization of Data Provenance , 2015, CIKM.

[9]  Felix Naumann,et al.  XStruct: Efficient Schema Extraction from Multiple and Large XML Documents , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[10]  Vasa Curcin,et al.  ProvAbs: model, policy, and tooling for abstracting PROV graphs , 2014, IPAW.

[11]  P. Riehmann,et al.  Interactive Sankey diagrams , 2005, IEEE Symposium on Information Visualization, 2005. INFOVIS 2005..

[12]  Dario Colazzo,et al.  Schema Inference for Massive JSON Datasets , 2017, EDBT.

[13]  Krzysztof Z. Gajos,et al.  Evaluation of Filesystem Provenance Visualization Tools , 2013, IEEE Transactions on Visualization and Computer Graphics.

[14]  Abdussalam Alawini,et al.  Discovering Similar Workflows via Provenance Clustering: A Case Study , 2018, IPAW.

[15]  Vasa Curcin,et al.  Templates as a method for implementing data provenance in decision support systems , 2017, J. Biomed. Informatics.

[16]  Melanie Herschel,et al.  Provenance for Entity Resolution , 2018, IPAW.

[17]  Simon Miles,et al.  Corroboration via Provenance Patterns , 2017, TaPP.

[18]  Paolo Missier,et al.  Analyzing Provenance Across Heterogeneous Provenance Graphs , 2016, IPAW.

[19]  Luc Moreau,et al.  Aggregation by Provenance Types: A Technique for Summarising Provenance Graphs , 2015, GaM.

[20]  Melanie Herschel,et al.  A survey on provenance: What for? What form? What from? , 2017, The VLDB Journal.

[21]  Phokion G. Kolaitis,et al.  Interactive generation of integrated schemas , 2008, SIGMOD Conference.