论文信息 - TOD-Tree: Task-Overlapped Direct send Tree Image Compositing for Hybrid MPI Parallelism

TOD-Tree: Task-Overlapped Direct send Tree Image Compositing for Hybrid MPI Parallelism

Modern supercomputers have very powerful multi-core CPUs. The programming model on these supercomputer is switching from pure MPI to MPI for inter-node communication, and shared memory and threads for intra-node communication. Consequently the bottleneck in most systems is no longer computation but communication between nodes. In this paper, we present a new compositing algorithm for hybrid MPI parallelism that focuses on communication avoidance and overlapping communication with computation at the expense of evenly balancing the workload. The algorithm has three stages: a direct send stage where nodes are arranged in groups and exchange regions of an image, followed by a tree compositing stage and a gather stage. We compare our algorithm with radix-k and binary-swap from the IceT library in a hybrid OpenMP/MPI setting, show strong scaling results and explain how we generally achieve better performance than these two algorithms.

Charles Hansen | Cameron Christensen | A. V. Pascal Grosset | Aaron Knoll | Manasa Prasad

[1] Robert A. van de Geijn,et al. Collective communication: theory, practice, and experience: Research Articles , 2007 .

[2] Georg Hager,et al. Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[3] Henry Fuchs,et al. A sorting classification of parallel rendering , 1994, IEEE Computer Graphics and Applications.

[4] Xavier Cavin,et al. Shift-Based Parallel Image Compositing on InfiniBandTM Fat-Trees , 2012, EGPGV@Eurographics.

[5] Kenneth D. Moreland,et al. IceT users' guide and reference. , 2009 .

[6] Robert B. Ross,et al. A configurable algorithm for parallel image-compositing applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[7] Kwan-Liu Ma,et al. Massively parallel volume rendering using 2-3 swap image compositing , 2008, HiPC 2008.

[8] E. Wes Bethel,et al. MPI-hybrid Parallelism for Volume Rendering on Large, Multi-core Systems , 2010, EGPGV@Eurographics.

[9] Ulrich Neumann. Communication costs for parallel volume-rendering algorithms , 1994, IEEE Computer Graphics and Applications.

[10] John Shalf,et al. Exascale Computing Technology Challenges , 2010, VECPAR.

[11] E. Wes Bethel,et al. Hybrid Parallelism for Volume Rendering on Large-, Multi-, and Many-Core Systems , 2012, IEEE Transactions on Visualization and Computer Graphics.

[12] Charles D. Hansen,et al. A data distributed, parallel algorithm for ray-traced volume rendering , 1993 .

[13] Juan Touriño,et al. Performance Evaluation of MPI, UPC and OpenMP on Multicore Architectures , 2009, PVM/MPI.

[14] Jian Huang,et al. An image compositing solution at scale , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[15] Ray W. Grout,et al. Ultrascale Visualization In Situ Visualization for Large-Scale Combustion Simulations , 2010 .

[16] Kwan-Liu Ma,et al. SLIC: scheduled linear image compositing for parallel volume rendering , 2003, IEEE Symposium on Parallel and Large-Data Visualization and Graphics, 2003. PVG 2003..

[17] Michael E. Papka,et al. Performance Modeling of vl3 Volume Rendering on GPU-Based Clusters , 2014, EGPGV@EuroVis.

[18] Abhinav Vishnu,et al. A performance comparison of current HPC systems: Blue Gene/Q, Cray XE6 and InfiniBand systems , 2014, Future Gener. Comput. Syst..

[19] William M. Hsu. Segmented ray casting for data parallel volume rendering , 1993 .

[20] Renato Pajarola,et al. Eurographics Symposium on Parallel Graphics and Visualization (2007) Direct Send Compositing for Parallel Sort-last Rendering , 2022 .

[21] Nelson L. Max,et al. A contract based system for large data visualization , 2005, VIS 05. IEEE Visualization, 2005..

[22] L. Dagum,et al. OpenMP: an industry standard API for shared-memory programming , 1998 .