Combining in-situ and in-transit processing to enable extreme-scale scientific analysis

With the onset of extreme-scale computing, I/O constraints make it increasingly difficult for scientists to save a sufficient amount of raw simulation data to persistent storage. One potential solution is to change the data analysis pipeline from a post-process centric to a concurrent approach based on either in-situ or in-transit processing. In this context computations are considered in-situ if they utilize the primary compute resources, while in-transit processing refers to offloading computations to a set of secondary resources using asynchronous data transfers. In this paper we explore the design and implementation of three common analysis techniques typically performed on large-scale scientific simulations: topological analysis, descriptive statistics, and visualization. We summarize algorithmic developments, describe a resource scheduling system to coordinate the execution of various analysis workflows, and discuss our implementation using the DataSpaces and ADIOS frameworks that support efficient data movement between in-situ and in-transit computations. We demonstrate the efficiency of our lightweight, flexible framework by deploying it on the Jaguar XK6 to analyze data generated by S3D, a massively parallel turbulent combustion code. Our framework allows scientists dealing with the data deluge at extreme scale to perform analyses at increased temporal resolutions, mitigate I/O costs, and significantly improve the time to insight.

[1]  Bernd Hamann,et al.  Topologically Clean Distance Fields , 2007, IEEE Transactions on Visualization and Computer Graphics.

[2]  Fan Zhang,et al.  Enabling In-situ Execution of Coupled Scientific Workflow on Multi-core Platform , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[3]  Ray W. Grout,et al.  Numerically stable, single-pass, parallel statistics algorithms , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[4]  Bernd Hamann,et al.  Adaptive Extraction and Quantification of Geophysical Vortices , 2011, IEEE Transactions on Visualization and Computer Graphics.

[5]  C. Pipper,et al.  [''R"--project for statistical computing]. , 2008, Ugeskrift for laeger.

[6]  Scott Klasky,et al.  DART: a substrate for high speed asynchronous data IO , 2008, HPDC '08.

[7]  Bernd Hamann,et al.  A topological hierarchy for functions on triangulated surfaces , 2004, IEEE Transactions on Visualization and Computer Graphics.

[8]  Jack Snoeyink,et al.  Computing contour trees in all dimensions , 2000, SODA '00.

[9]  Jack Snoeyink,et al.  Isocontour based Visualization of Time-varying Scalar Fields , 2009, Mathematical Foundations of Scientific Visualization, Computer Graphics, and Massive Data Exploration.

[10]  Bernd Hamann,et al.  A topological approach to simplification of three-dimensional scalar functions , 2006, IEEE Transactions on Visualization and Computer Graphics.

[11]  Scott Klasky,et al.  Moving the Code to the Data - Dynamic Code Deployment Using ActiveSpaces , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[12]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[13]  Bernd Hamann,et al.  A Practical Approach to Morse-Smale Complex Computation: Scalability and Generality , 2008, IEEE Transactions on Visualization and Computer Graphics.

[14]  Karsten Schwan,et al.  DataStager: scalable data staging services for petascale applications , 2009, HPDC '09.

[15]  Robert B. Ross,et al.  The Parallel Computation of Morse-Smale Complexes , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[16]  Scott Klasky,et al.  DataSpaces: an interaction and coordination framework for coupled simulation workflows , 2012, HPDC '10.

[17]  C.R. Johnson,et al.  SCIRun: A Scientific Programming Environment for Computational Steering , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[18]  T. Tu,et al.  From Mesh Generation to Scientific Visualization: An End-to-End Approach to Parallel Supercomputing , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[19]  Robert Latham,et al.  ISABELA-QA: Query-driven analytics with ISABELA-compressed extreme-scale scientific data , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[20]  Scott Klasky,et al.  Terascale direct numerical simulations of turbulent combustion using S3D , 2008 .

[21]  David R. O'Hallaron,et al.  Remote runtime steering of integrated terascale simulation and visualization , 2006, SC.

[22]  Valerio Pascucci,et al.  Understanding the Structure of the Turbulent Mixing Layer in Hydrodynamic Instabilities , 2006, IEEE Transactions on Visualization and Computer Graphics.

[23]  Kwan-Liu Ma Runtime volume visualization for parallel CFD , 1995 .

[24]  John B. Bell,et al.  Analyzing and Tracking Burning Structures in Lean Premixed Hydrogen Flames , 2010, IEEE Transactions on Visualization and Computer Graphics.

[25]  Karsten Schwan,et al.  PreDatA – preparatory data analytics on peta-scale machines , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[26]  Ray W. Grout,et al.  Feature-Based Statistical Analysis of Combustion Simulation Data , 2011, IEEE Transactions on Visualization and Computer Graphics.

[27]  Ramanan Sankaran,et al.  Three-dimensional direct numerical simulation of a turbulent lifted hydrogen jet flame in heated coflow: flame stabilization and structure , 2009, Journal of Fluid Mechanics.

[28]  Hao Yu,et al.  State of the Art in Parallel Computing with R , 2009 .

[29]  Michael E. Papka,et al.  Toward simulation-time data analysis and I/O acceleration on leadership-class systems , 2011, 2011 IEEE Symposium on Large Data Analysis and Visualization.

[30]  Fei Meng,et al.  Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[31]  Herbert Edelsbrunner,et al.  Hierarchical Morse—Smale Complexes for Piecewise Linear 2-Manifolds , 2003, Discret. Comput. Geom..

[32]  Ray W. Grout,et al.  Topological Feature Extraction for Comparison of Terascale Combustion Simulation Data , 2011, Topological Methods in Data Analysis and Visualization.

[33]  Ray W. Grout,et al.  Ultrascale Visualization In Situ Visualization for Large-Scale Combustion Simulations , 2010 .

[34]  Valerio Pascucci,et al.  Robust on-line computation of Reeb graphs: simplicity and speed , 2007, ACM Trans. Graph..

[35]  David C. Thompson,et al.  Computing Contingency Statistics in Parallel: Design Trade-Offs and Limiting Cases , 2010, 2010 IEEE International Conference on Cluster Computing.

[36]  Jeremy S. Meredith,et al.  Parallel in situ coupling of simulation with a fully featured visualization system , 2011, EGPGV '11.

[37]  Shannon Bradshaw,et al.  A distributed, parallel, interactive volume rendering package , 1994, Proceedings Visualization '94.

[38]  Valerio Pascucci,et al.  Topological feature extraction and tracking , 2007 .

[39]  Valerio Pascucci,et al.  Parallel Computation of the Topology of Level Sets , 2003, Algorithmica.

[40]  Kenneth Moreland,et al.  Sandia National Laboratories , 2000 .

[41]  Karsten Schwan,et al.  Just in time: adding value to the IO pipelines of high performance applications with JITStaging , 2011, HPDC '11.

[42]  David C. Thompson,et al.  Design and Performance of a Scalable, Parallel Statistics Toolkit , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[43]  John B. Bell,et al.  Interactive Exploration and Analysis of Large-Scale Simulations Using Topology-Based Data Segmentation , 2011, IEEE Transactions on Visualization and Computer Graphics.