Automated Dynamic Data Redistribution

High-performance distributed memory applications often load or receive data in a format that differs from what the application uses. One such difference arises from how the application distributes data for parallel processing. Data must be redistributed from how it was laid out by the producer to how the application needs the data to be laid out amongst its processes. In this paper, we present a large-scale distributed memory library, provided to developers in an easily integrated API, for automating data redistribution in MPI enabled applications. We then present the results of two scientific computing use cases to evaluate our library. The first use case highlights how dynamic data redistribution can greatly reduce load time when reading three-dimensional medical imaging data from disk. The second use case highlights how dynamic data redistribution can facilitate in-transit analysis of computational fluid dynamics, which results in smaller data output size and faster time-to-discovery.

[1]  Olivier Coulaud,et al.  A Steering Environment for Online Parallel Visualization of Legacy Parallel Simulations , 2006, 2006 Tenth IEEE International Symposium on Distributed Simulation and Real-Time Applications.

[2]  Michael E. Papka,et al.  Performance Modeling of vl3 Volume Rendering on GPU-Based Clusters , 2014, EGPGV@EuroVis.

[3]  Jiwu Shu,et al.  SLAS: An efficient approach to scaling round-robin striped volumes , 2007, TOS.

[4]  Michael E. Papka,et al.  Topology-aware data movement and staging for I/O acceleration on Blue Gene/P supercomputing systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[5]  B. Corrie,et al.  Parallel volume rendering and data coherence , 1993, Proceedings of 1993 IEEE Parallel Rendering Symposium.

[6]  Torsten Hoefler,et al.  Automatic datatype generation and optimization , 2012, PPoPP '12.

[7]  D. Martin Swany,et al.  Improving MPI communication via data type fission , 2010, HPDC '10.

[8]  Carlo Cavazzoni,et al.  FFT data distribution in plane-waves DFT codes. A case study from Quantum ESPRESSO , 2016, EuroMPI.

[9]  Prabhat,et al.  DOE High Performance Computing Operational Review (HPCOR): Enabling Data-Driven Scientific Discovery at HPC Facilities , 2014 .

[10]  P. Sadayappan,et al.  An approach to communication-efficient data redistribution , 1994, ICS '94.

[11]  Francky Catthoor,et al.  Array Interleaving—An Energy-Efficient Data Layout Transformation , 2015, TODE.

[12]  Dmitriy Morozov,et al.  Block-parallel data analysis with DIY2 , 2016, 2016 IEEE 6th Symposium on Large Data Analysis and Visualization (LDAV).

[13]  M. Ament,et al.  Volume Rendering , 2015 .

[14]  James P. Ahrens,et al.  Cinema image-based in situ analysis and visualization of MPAS-ocean simulations , 2016, Parallel Comput..

[15]  C. C. Law,et al.  ParaView: An End-User Tool for Large-Data Visualization , 2005, The Visualization Handbook.

[16]  Carl Albing,et al.  Accelerating an MPI Lattice Boltzmann code using OpenACC , 2015, WACCPD '15.

[17]  Kwan-Liu Ma,et al.  Multi-GPU volume rendering using MapReduce , 2010, HPDC '10.

[18]  Lee Westover,et al.  Interactive volume rendering , 1989, VVS '89.

[19]  Karsten Schwan,et al.  Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS) , 2008, CLADE '08.