Hierarchical Mapping for HPC Applications

As the high performance computing systems scale up, mapping the tasks of a parallel application onto physical processors to allow efficient communication becomes one of the critical performance issues. Existing algorithms were usually designed to map applications with regular communication patterns. Their mapping criterion usually overlooks the size of communicated messages, which is the primary factor of communication time. In addition, most of their time complexities are too high to process large scale problems. In this paper, we present a hierarchical mapping algorithm (HMA), which is capable of mapping applications with irregular communication patterns. It first partitions tasks according to their run-time communication information. The tasks that communicate with each others more frequently are regarded as strongly connected. Based on their connectivity strength, the tasks are partitioned into super nodes based on the algorithms in spectral graph theory. The hierarchical partitioning reduces the mapping algorithm complexity to achieve scalability. Finally, the run-time communication information will be used again in fine tuning to explore better mappings. With the experiments, we show how the mapping algorithm helps to reduce the point-to-point communication time for the PDGEMM, a ScaLAPACK matrix multiplication computation kernel, up to 20% and the AMG2006, a tier 1 application of the Sequoia benchmark, up to 7%.

[1]  Hee Yong Youn,et al.  Mapping strategies for switch-based cluster systems of irregular topology , 2001, Proceedings. Eighth International Conference on Parallel and Distributed Systems. ICPADS 2001.

[2]  Jesper Larsson Träff Implementing the MPI process topology mechanism , 2002, SC '02.

[3]  Shahid H. Bokhari,et al.  Mapping with Space Filling Surfaces , 2007, IEEE Transactions on Parallel and Distributed Systems.

[4]  Philip Heidelberger,et al.  Optimizing task layout on the Blue Gene/L supercomputer , 2005, IBM J. Res. Dev..

[5]  Éva Tardos,et al.  Algorithm design , 2005 .

[6]  Laxmikant V. Kale,et al.  Automated Mapping of Structured Communication Graphs onto Mesh Interconnects , 2010 .

[7]  Seetharami R. Seelam,et al.  A Productivity Centered Tools Framework for Application Performance Tuning , 2007 .

[8]  John A. Ellis Embedding Rectangular Grids into Square Grids , 1991, IEEE Trans. Computers.

[9]  Rami G. Melhem,et al.  Embedding Rectangular Grids into Square Grids with Dilation Two , 1990, IEEE Trans. Computers.

[10]  Marc Snir,et al.  GETTING UP TO SPEED THE FUTURE OF SUPERCOMPUTING , 2004 .

[11]  Jack Dongarra,et al.  ScaLAPACK Users' Guide , 1987 .

[12]  Arnold L. Rosenberg,et al.  On Embedding Rectangular Grids in Square Grids , 1982, IEEE Transactions on Computers.

[13]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  José E. Moreira,et al.  Topology Mapping for Blue Gene/L Supercomputer , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[15]  S. Donatelli,et al.  CSL^TA: an Expressive Logic for Continuous-Time Markov Chains , 2007, Fourth International Conference on the Quantitative Evaluation of Systems (QEST 2007).

[16]  Laxmikant V. Kalé,et al.  A Case Study of Communication Optimizations on 3D Mesh Interconnects , 2009, Euro-Par.