Performance analysis and optimization of in-situ integration of simulation with data analysis: zipping applications up

This paper targets an important class of applications that requires combining HPC simulations with data analysis for online or real-time scientific discovery. We use the state-of-the-art parallel-IO and data-staging libraries to build simulation-time data analysis workflows, and conduct performance analysis with real-world applications of computational fluid dynamics (CFD) simulations and molecular dynamics (MD) simulations. Driven by in-depth performance inefficiency analysis, we design an end-to-end application-level approach to eliminating the interlocks and synchronizations existent in the present methods. Our new approach employs both task parallelism and pipeline parallelism to reduce synchronizations effectively. In addition, we design a fully asynchronous, fine-grain, and pipelining runtime system, which is named Zipper. Zipper is a multi-threaded distributed runtime system and executes in a layer below the simulation and analysis applications. To further reduce the simulation application's stall time and enhance the data transfer performance, we design a concurrent data transfer optimization that uses both HPC network and parallel file system for improved bandwidth. The scalability of the Zipper system has been verified by a performance model and various empirical large scale experiments. The experimental results on an Intel multicore cluster as well as a Knight Landing HPC system demonstrate that the Zipper based approach can outperform the fastest state-of-the-art I/O transport library by up to 220% using 13,056 processor cores.

[1]  Karsten Schwan,et al.  FlexIO: I/O Middleware for Location-Flexible Scientific Data Analytics , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[2]  Gargi Dasgupta,et al.  Distributed and Adaptive Execution of Condor DAGMan Workflows , 2010, SEKE.

[3]  Rajeev Thakur,et al.  On implementing MPI-IO portably and with high performance , 1999, IOPADS '99.

[4]  Arie Shoshani,et al.  Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks , 2014, Concurr. Comput. Pract. Exp..

[5]  Jet Efda Contributors,et al.  Development of an efficient real-time disruption predictor from scratch on JET and implications for ITER , 2013 .

[6]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[7]  Angela B. Shiflet,et al.  Introduction to Computational Science: Modeling and Simulation for the Sciences , 2006 .

[8]  Robert B. Ross,et al.  Using MPI-2: Advanced Features of the Message Passing Interface , 2003, CLUSTER.

[9]  Lan Lin,et al.  LBM-IB: A Parallel Library to Solve 3D Fluid-Structure Interaction Problems on Manycore Systems , 2015, 2015 44th International Conference on Parallel Processing.

[10]  Carole A. Goble,et al.  The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud , 2013, Nucleic Acids Res..

[11]  J. Lumley Stochastic tools in turbulence , 1970 .

[12]  Christophe Bailly,et al.  Numerical Simulation of Unsteady Cavity Flow Using Lattice Boltzmann Method , 2002 .

[13]  Ewing Lusk,et al.  More scalability, less pain : A simple programming model and its implementation for extreme computing. , 2010 .

[14]  Hamid Moradkhani,et al.  Towards improved post‐processing of hydrologic forecast ensembles , 2014 .

[15]  Karsten Schwan,et al.  DataStager: scalable data staging services for petascale applications , 2009, HPDC '09.

[16]  Keith D. Underwood,et al.  Intel® Omni-path Architecture: Enabling Scalable, High Performance Fabrics , 2015, 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects.

[17]  Viktor Leis,et al.  Processing in the Hybrid OLTP & OLAP Main-Memory Database System HyPer , 2013, IEEE Data Eng. Bull..

[18]  Fan Zhang,et al.  Programming and runtime support for enabling data-intensive coupled scientific simulation workflows , 2015 .

[19]  Daniel S. Katz,et al.  Swift/T: Large-Scale Application Composition via Distributed-Memory Dataflow Processing , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.

[20]  C. Shu,et al.  Lattice Boltzmann Method and Its Applications in Engineering , 2013 .

[21]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[22]  Alok Choudhary,et al.  Synergistic Challenges in Data-Intensive Science and Exascale Computing: DOE ASCAC Data Subcommittee Report , 2013 .

[23]  Jack J. Dongarra,et al.  Exascale computing and big data , 2015, Commun. ACM.

[24]  Bongjae Kim,et al.  Dynamic QoS Scheme for InfiniBand-Based Clusters , 2016, CSA/CUTE.

[25]  Allen D. Malony,et al.  The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..

[26]  Kenneth Moreland,et al.  Sandia National Laboratories , 2000 .

[27]  Scott Klasky,et al.  In Situ Methods, Infrastructures, and Applications on High Performance Computing Platforms , 2016, Comput. Graph. Forum.

[28]  Karsten Schwan,et al.  Extending I/O through high performance data services , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[29]  Karsten Schwan,et al.  Flexpath: Type-Based Publish/Subscribe System for Large-Scale Science Analytics , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[30]  Alexander S. Szalay,et al.  From Large Simulations to Interactive Numerical Laboratories , 2013, IEEE Data Eng. Bull..

[31]  L. Knowles,et al.  Unforeseen Consequences of Excluding Missing Data from Next-Generation Sequences: Simulation Study of RAD Sequences. , 2016, Systematic biology.

[32]  Carl D. Meinhart,et al.  Simulation of fluid slip at 3D hydrophobic microchannel walls by the lattice Boltzmann method , 2005 .

[33]  Matthieu Dreher,et al.  Decaf: Decoupled Dataflows for In Situ High-Performance Workflows , 2017 .

[34]  Karsten Schwan,et al.  Event-based systems: opportunities and challenges at exascale , 2009, DEBS '09.

[35]  Feng Li,et al.  A Real-Time Machine Learning and Visualization Framework for Scientific Workflows , 2017, PEARC.

[36]  Karsten Schwan,et al.  PreDatA – preparatory data analytics on peta-scale machines , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[37]  Christopher D. Carothers,et al.  Using quality of service lanes to control the impact of raid traffic within a burst buffer , 2017, 2017 Winter Simulation Conference (WSC).

[38]  Olav Lysne,et al.  An overview of QoS capabilities in infiniband, advanced switching interconnect, and ethernet , 2006, IEEE Communications Magazine.

[39]  Kwan-Liu Ma,et al.  In Situ Visualization at Extreme Scale: Challenges and Opportunities , 2009, IEEE Computer Graphics and Applications.

[40]  Fan Zhang,et al.  Combining in-situ and in-transit processing to enable extreme-scale scientific analysis , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[41]  J. Schumacher,et al.  Derivative moments in stationary homogeneous shear turbulence , 2001, Journal of Fluid Mechanics.

[42]  Scott Klasky,et al.  In-Staging Data Placement for Asynchronous Coupling of Task-Based Scientific Workflows , 2016, 2016 Second International Workshop on Extreme Scale Programming Models and Middlewar (ESPM2).

[43]  Malathi Veeraraghavan,et al.  A measurement study of congestion in an InfiniBand network , 2017, 2017 Network Traffic Measurement and Analysis Conference (TMA).

[44]  Tomoo Ushio,et al.  “Big Data Assimilation” Revolutionizing Severe Weather Prediction , 2016 .

[45]  Geoffrey C. Fox,et al.  Big Data, Simulations and HPC Convergence , 2015, WBDB.

[46]  Michael E. Papka,et al.  In situ data analysis and I / O acceleration of FLASH astrophysics simulation on leadership-class system using GLEAN , 2011 .

[47]  Scott Klasky,et al.  DataSpaces: an interaction and coordination framework for coupled simulation workflows , 2012, HPDC '10.

[48]  Geoffrey C. Fox,et al.  Examining the Challenges of Scientific Workflows , 2007, Computer.