An Overview of Topology Mapping Algorithms and Techniques in High‐Performance Computing

[1]  Toshiyuki Shimizu,et al.  Tofu: A 6D Mesh/Torus Interconnect for Exascale Computers , 2009, Computer.

[2]  Franck Cappello,et al.  The International Exascale Software Project: a Call To Cooperative Action By the Global High-Performance Community , 2009, Int. J. High Perform. Comput. Appl..

[3]  Emmanuel Jeannot,et al.  Near-Optimal Placement of MPI Processes on Hierarchical NUMA Architectures , 2010, Euro-Par.

[4]  Laxmikant V. Kalé,et al.  Dynamic topology aware load balancing algorithms for molecular dynamics applications , 2009, ICS.

[5]  Larry Kaplan,et al.  The Gemini System Interconnect , 2010, 2010 18th IEEE Symposium on High Performance Interconnects.

[6]  Courtenay T. Vaughan,et al.  Zoltan data management services for parallel dynamic applications , 2002, Comput. Sci. Eng..

[7]  Charles E. Leiserson,et al.  Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.

[8]  Torsten Hoefler,et al.  The PERCS High-Performance Interconnect , 2010, 2010 18th IEEE Symposium on High Performance Interconnects.

[9]  Chao Yang,et al.  Topology-Aware Mappings for Large-Scale Eigenvalue Problems , 2012, Euro-Par.

[10]  William J. Dally,et al.  Cost-Efficient Dragonfly Topology for Large-Scale Systems , 2009, IEEE Micro.

[11]  Jake K. Aggarwal,et al.  A Mapping Strategy for Parallel Processing , 1987, IEEE Transactions on Computers.

[12]  Philip Heidelberger,et al.  Blue Gene/L torus interconnection network , 2005, IBM J. Res. Dev..

[13]  Scott F. Midkiff,et al.  Heuristic Technique for Processor and Link Assignment in Multicomputers , 1991, IEEE Trans. Computers.

[14]  Jeffrey M. Squyres,et al.  Locality-Aware Parallel Process Mapping for Multi-core HPC Systems , 2011, 2011 IEEE International Conference on Cluster Computing.

[15]  Mohan Kumar,et al.  On generalized fat trees , 1995, Proceedings of 9th International Parallel Processing Symposium.

[16]  E. Cuthill,et al.  Reducing the bandwidth of sparse symmetric matrices , 1969, ACM '69.

[17]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[18]  Takao Hatazaki,et al.  Rank Reordering Strategy for MPI Topology Creation Functions , 1998, PVM/MPI.

[19]  Hubert Ritzdorf,et al.  The scalable process topology interface of MPI 2.2 , 2011, Concurr. Comput. Pract. Exp..

[20]  Laxmikant V. Kalé,et al.  CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.

[21]  Guillaume Mercier,et al.  Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments , 2009, PVM/MPI.

[22]  Arnold L. Rosenberg,et al.  Issues in the Study of Graph Embeddings , 1980, WG.

[23]  Minna Palmroth,et al.  Topology Aware Process Mapping , 2012, PARA.

[24]  Timothy Roscoe,et al.  VF2x: Fast, Efficient Virtual Network Mapping for Real Testbed Workloads , 2012, TRIDENTCOM.

[25]  Laxmikant V. Kalé,et al.  Benefits of Topology Aware Mapping for Mesh Interconnects , 2008, Parallel Process. Lett..

[26]  Kenji Ono,et al.  Automatically optimized core mapping to subdomains of domain decomposition method on multicore parallel environments , 2013 .

[27]  Shang-Hua Teng,et al.  How Good is Recursive Bisection? , 1997, SIAM J. Sci. Comput..

[28]  Shahid H. Bokhari,et al.  On the Mapping Problem , 1981, IEEE Transactions on Computers.

[29]  B. Brandfass,et al.  Rank reordering for MPI communication optimization , 2013 .

[30]  Fabrizio Petrini,et al.  k-ary n-trees: high performance networks for massively parallel architectures , 1997, Proceedings 11th International Parallel Processing Symposium.