DfAnalyzer: Runtime dataflow analysis tool for Computational Science and Engineering applications

Abstract DfAnalyzer is a tool for monitoring, debugging, and analyzing dataflows generated by Computational Science and Engineering (CSE) applications. It collects strategic raw data, registering provenance data, and enabling query processing, all asynchronously and at runtime. DfAnalyzer provides lightweight dataflow components to be invoked by CSE applications using High-Performance Computing (HPC), in the same way computational scientists plug HPC (e.g., PETSc) and visualization (e.g., ParaView) libraries. We show DfAnalyzer’s main functionalities and how to analyze dataflows in CSE applications at runtime. The performance evaluation of CSE executions for a complex multiphysics application shows that DfAnalyzer has negligible time overhead on the total elapsed time.

[1]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[2]  Benjamin S. Kirk,et al.  Library for Parallel Adaptive Mesh Refinement / Coarsening Simulations , 2006 .

[3]  Prabhat,et al.  FastBit: interactively searching massive data , 2009 .

[4]  Arie Shoshani,et al.  Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks , 2014, Concurr. Comput. Pract. Exp..

[5]  Heroux Michael Improving Reproducibility through Better Software Practices , 2017 .

[6]  Anastasia Ailamaki,et al.  NoDB: efficient query execution on raw data files , 2012, Commun. ACM.

[7]  Luc Moreau,et al.  A Templating System to Generate Provenance , 2018, IEEE Transactions on Software Engineering.

[8]  Cláudio T. Silva,et al.  Provenance for Computational Tasks: A Survey , 2008, Computing in Science & Engineering.

[9]  Marta Mattoso,et al.  A Survey of Data-Intensive Scientific Workflow Management , 2015, Journal of Grid Computing.

[10]  J. Tinsley Oden,et al.  Research and Education in Computational Science and Engineering , 2016, ArXiv.

[11]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[12]  Scott Klasky,et al.  In Situ Methods, Infrastructures, and Applications on High Performance Computing Platforms , 2016, Comput. Graph. Forum.

[13]  Marta Mattoso,et al.  DfAnalyzer: Runtime Dataflow Analysis of Scientific Applications using Provenance , 2018, Proc. VLDB Endow..

[14]  Juliana Freire,et al.  Provenance and scientific workflows: challenges and opportunities , 2008, SIGMOD Conference.

[15]  Marta Mattoso,et al.  Raw data queries during data-intensive parallel workflow execution , 2017, Future Gener. Comput. Syst..

[16]  Marta Mattoso,et al.  Analyzing related raw data files through dataflows , 2016, Concurr. Comput. Pract. Exp..

[17]  Marta Mattoso,et al.  In situ visualization and data analysis for turbidity currents simulation , 2018, Comput. Geosci..

[18]  Juliana Freire,et al.  noWorkflow: a Tool for Collecting, Analyzing, and Managing Provenance from Python Scripts , 2017, Proc. VLDB Endow..

[19]  Marta Mattoso,et al.  Keeping Track of User Steering Actions in Dynamic Workflows , 2019, Future Gener. Comput. Syst..

[20]  Marta Mattoso,et al.  Capturing Provenance for Runtime Data Analysis in Computational Science and Engineering Applications , 2018, IPAW.

[21]  Hassaan Irshad,et al.  Scaling SPADE to "Big Provenance" , 2016, TaPP.

[22]  Anders Logg,et al.  The FEniCS Project Version 1.5 , 2015 .