AVOCADO: Visualization of Workflow–Derived Data Provenance for Reproducible Biomedical Research

A major challenge in data‐driven biomedical research lies in the collection and representation of data provenance information to ensure that findings are reproducibile. In order to communicate and reproduce multi‐step analysis workflows executed on datasets that contain data for dozens or hundreds of samples, it is crucial to be able to visualize the provenance graph at different levels of aggregation. Most existing approaches are based on node‐link diagrams, which do not scale to the complexity of typical data provenance graphs. In our proposed approach, we reduce the complexity of the graph using hierarchical and motif‐based aggregation. Based on user action and graph attributes, a modular degree‐of‐interest (DoI) function is applied to expand parts of the graph that are relevant to the user. This interest‐driven adaptive approach to provenance visualization allows users to review and communicate complex multi‐step analyses, which can be based on hundreds of files that are processed by numerous workflows. We have integrated our approach into an analysis platform that captures extensive data provenance information, and demonstrate its effectiveness by means of a biomedical usage scenario.

[1]  Benno Schwikowski,et al.  Graph-based methods for analysing networks in cell biology , 2006, Briefings Bioinform..

[2]  Jeffrey Heer,et al.  SpanningAspectRatioBank Easing FunctionS ArrayIn ColorIn Date Interpolator MatrixInterpola NumObjecPointI Rectang ISchedu Parallel Pause Scheduler Sequen Transition Transitioner Transiti Tween Co DelimGraphMLCon IData JSONCon DataField DataSc Dat DataSource Data DataUtil DirtySprite LineS RectSprite , 2011 .

[3]  Jocelyn Kaiser,et al.  The cancer test. , 2015, Science.

[4]  Jean-Daniel Fekete,et al.  Hierarchical Aggregation for Information Visualization: Overview, Techniques, and Design Guidelines , 2010, IEEE Transactions on Visualization and Computer Graphics.

[5]  Margo I. Seltzer,et al.  Provenance Map Orbiter: Interactive Exploration of Large Provenance Graphs , 2011, TaPP.

[6]  Frank van Ham,et al.  “Search, Show Context, Expand on Demand”: Supporting Large Graph Exploration with Degree-of-Interest , 2009, IEEE Transactions on Visualization and Computer Graphics.

[7]  Karsten Klein,et al.  High-Quality Ultra-Compact Grid Layout of Grouped Networks , 2016, IEEE Transactions on Visualization and Computer Graphics.

[8]  J. Ioannidis,et al.  Reproducibility in Science: Improving the Standard for Basic and Preclinical Research , 2015, Circulation research.

[9]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[10]  Cláudio T. Silva,et al.  VisTrails: enabling interactive multiple-view visualizations , 2005, VIS 05. IEEE Visualization, 2005..

[11]  Nancy Argüelles,et al.  Author ' s , 2008 .

[12]  HeerJeffrey,et al.  D3 Data-Driven Documents , 2011 .

[13]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[14]  Lawrence Hunter,et al.  Visual analysis of biological data-knowledge networks , 2015, BMC Bioinformatics.

[15]  Holger Stitz,et al.  ThermalPlot: Visualizing Multi-Attribute Time-Series Data Using a Thermal Metaphor , 2016, IEEE Transactions on Visualization and Computer Graphics.

[16]  Michael Jünger,et al.  Journal of Graph Algorithms and Applications 2-layer Straightline Crossing Minimization: Performance of Exact and Heuristic Algorithms , 2022 .

[17]  Heidrun Schumann,et al.  A Survey of Multi-faceted Graph Visualization , 2015, EuroVis.

[18]  Heidrun Schumann,et al.  Visualizing Graphs - A Generalized View , 2006, Tenth International Conference on Information Visualisation (IV'06).

[19]  Ulrik Brandes,et al.  Fast and Simple Horizontal Coordinate Assignment , 2001, GD.

[20]  C. Begley,et al.  Drug development: Raise standards for preclinical cancer research , 2012, Nature.

[21]  Heidrun Schumann,et al.  Visualization of Time-Oriented Data , 2011, Human-Computer Interaction Series.

[22]  Holger Stitz,et al.  CloudGazer: A divide-and-conquer approach to monitoring and optimizing cloud-based networks , 2015, 2015 IEEE Pacific Visualization Symposium (PacificVis).

[23]  Carole A. Goble,et al.  The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud , 2013, Nucleic Acids Res..

[24]  M. S. Avila-Garcia,et al.  From Peer-Reviewed to Peer-Reproduced in Scholarly Publishing: The Complementary Roles of Data Models and Workflows in Bioinformatics , 2015, PloS one.

[25]  Fabian Beck,et al.  The State of the Art in Visualizing Group Structures in Graphs , 2015, EuroVis.

[26]  P. Park ChIP–seq: advantages and challenges of a maturing technology , 2009, Nature Reviews Genetics.

[27]  Jarke J. van Wijk,et al.  Reducing Snapshots to Points: A Visual Analytics Approach to Dynamic Network Exploration , 2016, IEEE Transactions on Visualization and Computer Graphics.

[28]  Oliver Hofmann,et al.  ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level , 2010, Bioinform..

[29]  G. W. Furnas,et al.  Generalized fisheye views , 1986, CHI '86.

[30]  Jeffrey Heer,et al.  DOITrees revisited: scalable, space-constrained visualization of hierarchical data , 2004, AVI.

[31]  Sushil Jajodia,et al.  Managing attack graph complexity through visual hierarchical aggregation , 2004, VizSEC/DMSEC '04.

[32]  Heidrun Schumann,et al.  A Modular Degree-of-Interest Specification for the Visual Analysis of Large Dynamic Networks , 2014, IEEE Transactions on Visualization and Computer Graphics.

[33]  Alex Endert,et al.  Characterizing Provenance in Visualization and Data Analysis: An Organizational Framework of Provenance Types and Purposes , 2016, IEEE Transactions on Visualization and Computer Graphics.

[34]  Jim Davies,et al.  Visual Compression of Workflow Visualizations with Automated Detection of Macro Motifs , 2013, IEEE Transactions on Visualization and Computer Graphics.

[35]  Adam A. Margolin,et al.  Enabling transparent and collaborative computational analysis of 12 tumor types within The Cancer Genome Atlas , 2013, Nature Genetics.

[36]  Michael Burch,et al.  The State of the Art in Visualizing Dynamic Graphs , 2014, EuroVis.

[37]  Raphael Gottardo,et al.  Comparability and reproducibility of biomedical data , 2012, Briefings Bioinform..

[38]  Daniel W. Archambault,et al.  Structural differences between two graphs through hierarchies , 2009, Graphics Interface.

[39]  S. Buck,et al.  Solving reproducibility , 2015, Science.

[40]  Jean-Daniel Fekete,et al.  Task taxonomy for graph visualization , 2006, BELIV '06.

[41]  Ken Perlin,et al.  Pad: an alternative approach to the computer interface , 1993, SIGGRAPH.

[42]  Natalie Kerracher,et al.  The Design Space of Temporal Graph Visualisation , 2014, EuroVis.

[43]  Heidrun Schumann,et al.  Fisheye Tree Views and Lenses for Graph Visualization , 2006, Tenth International Conference on Information Visualisation (IV'06).

[44]  Ilkay Altintas,et al.  Provenance Collection Support in the Kepler Scientific Workflow System , 2006, IPAW.