Parallel I/O, analysis, and visualization of a trillion particle simulation

Petascale plasma physics simulations have recently entered the regime of simulating trillions of particles. These unprecedented simulations generate massive amounts of data, posing significant challenges in storage, analysis, and visualization. In this paper, we present parallel I/O, analysis, and visualization results from a VPIC trillion particle simulation running on 120,000 cores, which produces ~30TB of data for a single timestep. We demonstrate the successful application of H5Part, a particle data extension of parallel HDF5, for writing the dataset at a significant fraction of system peak I/O rates. To enable efficient analysis, we develop hybrid parallel FastQuery to index and query data using multi-core CPUs on distributed memory hardware. We show good scalability results for the FastQuery implementation using up to 10,000 cores. Finally, we apply this indexing/query-driven approach to facilitate the first-ever analysis and visualization of the trillion particle dataset.

[1]  H. Karimabadi,et al.  Influence of the lower-hybrid drift instability on magnetic reconnection in asymmetric configurations. , 2012, Physical review letters.

[2]  K. Stockinger,et al.  Detecting Distributed Scans Using High-Performance Query-Driven Visualization , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[3]  M. Shay,et al.  Two-scale structure of the electron dissipation region during collisionless magnetic reconnection. , 2007, Physical review letters.

[4]  Kesheng Wu,et al.  FastBit: An Efficient Indexing Technology For Accelerating Data-Intensive Science , 2005 .

[5]  Katie Antypas,et al.  MPI-I/O on Franklin XT4 System at NERSC , 2009 .

[6]  Ben Shneiderman,et al.  Visual Specification of Queries for Finding Patterns in Time-Series Data (2001) , 2005 .

[7]  Arie Shoshani,et al.  Analyses of multi-level and multi-component compressed bitmap indexes , 2010, TODS.

[8]  Mahidhar Tatineni,et al.  Global Hybrid and Fully Kinetic Simulations of the Magnetosphere , 2010 .

[9]  Jian Huang,et al.  Distribution-Driven Visualization of Volume Data , 2009, IEEE Transactions on Visualization and Computer Graphics.

[10]  John Shalf,et al.  Query-driven visualization of large data sets , 2005, VIS 05. IEEE Visualization, 2005..

[11]  Karsten Schwan,et al.  Adaptable, metadata rich IO methods for portable high performance IO , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[12]  D. S. Henty,et al.  Performance of Hybrid Message-Passing and Shared-Memory Parallelism for Discrete Element Modeling , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[13]  Jeremy S. Meredith,et al.  Parallel in situ coupling of simulation with a fully featured visualization system , 2011, EGPGV '11.

[14]  Robert Latham,et al.  Using Subfiling to Improve Programming Flexibility and Performance of Parallel Shared-file I/O , 2009, 2009 International Conference on Parallel Processing.

[15]  Ray W. Grout,et al.  Ultrascale Visualization In Situ Visualization for Large-Scale Combustion Simulations , 2010 .

[16]  Rajeev Thakur,et al.  Non-data-communication Overheads in MPI: Analysis on Blue Gene/P , 2008, PVM/MPI.

[17]  Prabhat,et al.  Extreme Scaling of Production Visualization Software on Diverse Architectures , 2010, IEEE Computer Graphics and Applications.

[18]  Ben. Pontin,et al.  The IPCC fifth assessment report , 2013 .

[19]  Prabhat,et al.  High performance multivariate visual data exploration for extremely large data , 2008, HiPC 2008.

[20]  Ben Shneiderman,et al.  Interactive Exploration of Time Series Data , 2001, Discovery Science.

[21]  Arie Shoshani,et al.  Parallel index and query for large scale data analysis , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[22]  Jianwei Li,et al.  Parallel netCDF: A High-Performance Scientific I/O Interface , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[23]  Arie Shoshani,et al.  Scientific Data Management - Challenges, Technology, and Deployment , 2009, Scientific Data Management.

[24]  Oliver Rubel,et al.  Automatic Beam Path Analysis of Laser Wakefield Particle Acceleration Data , 2010 .

[25]  Prabhat,et al.  FastBit: interactively searching massive data , 2009 .

[26]  K. Bowers,et al.  Ultrahigh performance three-dimensional electromagnetic relativistic kinetic plasma simulationa) , 2008 .

[27]  L. Berkeley Deploying Server-side File System Monitoring at NERSC , 2009 .

[28]  Hans-Peter Kriegel,et al.  VisDB: database exploration using multidimensional visualization , 1994, IEEE Computer Graphics and Applications.

[29]  William Daughton,et al.  Fully kinetic simulations of undriven magnetic reconnection with open boundary conditions , 2006 .

[30]  Kesheng Wu,et al.  FastQuery: A General Indexing and Querying System for Scientific Data , 2011, SSDBM.

[31]  Prabhat,et al.  Title High performance multivariate visual data exploration for extremely large data Permalink , 2008 .

[32]  C. Russell,et al.  First resolved observations of the demagnetized electron-diffusion region of an astrophysical magnetic-reconnection site. , 2012, Physical review letters.

[33]  Desney S. Tan,et al.  FacetMap: A Scalable Search and Browse Visualization , 2006, IEEE Transactions on Visualization and Computer Graphics.

[34]  Michael Hesse,et al.  Particle-in-cell simulation of collisionless reconnection with open outflow boundaries , 2008 .

[35]  Brian Behlendorf,et al.  Visualizing I / O Performance During the BGL Deployment ∗ † , 2007 .

[36]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[37]  William Daughton,et al.  Role of electron physics in the development of turbulent magnetic reconnection in collisionless plasmas , 2011 .

[38]  Karsten Schwan,et al.  Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS) , 2008, CLADE '08.

[39]  E. Wes Bethel,et al.  MPI-hybrid Parallelism for Volume Rendering on Large, Multi-core Systems , 2010, EGPGV@Eurographics.

[40]  Kesheng Wu,et al.  FastQuery: A Parallel Indexing System for Scientific Data , 2011, 2011 IEEE International Conference on Cluster Computing.

[41]  William Daughton,et al.  Large-scale electron acceleration by parallel electric fields during magnetic reconnection , 2011, Nature Physics.

[42]  William Daughton,et al.  Multi‐scale structure of the electron diffusion region , 2007 .

[43]  Prabhat,et al.  H5hut: A high-performance I/O library for particle-based simulations , 2010, 2010 IEEE International Conference On Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS).

[44]  John Shalf,et al.  Tuning HDF5 for Lustre File Systems , 2010 .