Task mapping on supercomputers with cellular networks

This thesis focuses on techniques of task mapping for solving problems on parallel computers with hundreds of thousands of processors on cellular networks. Task mapping is a serious intellectual challenge and a practical tool for unleashing the potential power of supercomputers. It is challenging because of both the astronomical searching space and the high dependence on the exact nature of the applications and the computers. In this thesis, we propose two general static mapping models to optimize the assignment of tasks on heterogeneous, distributed-memory, ultra-scalable computers. In our models, the underlying application problems can be appropriately decomposed to subtasks with known computational load and known inter-task communicational demands. We also know, or can conveniently measure, the computing systems' specifications such as individual processor speed and inter-processor communication cost. Our models abstract an application as a demand matrix and a parallel computer as a load matrix and a supply matrix with which we construct our models as minimizing the objective function value for completing the application on the given computer. We have tested several applications on Blue Gene/L supercomputer with 3D mesh and torus networks. For a 2D wave equation, the mappings generated by our models reduced communication by 51% for 3D-mesh and 31% for 3D-torus over the default MPI rank order mapping. For SMG2000 application, our mapping can reduce communication and total time by 16% and 5% over the default MPI rank order mapping, respectively. For NPB MG, we improve the communication time and benchmark result by 53% and 13%, respectively. For NPB CG, we improve the communication time and benchmark result by 43% and 22%, respectively. We believe that our models are useful for task assignment for broad applications on a family of supercomputers with cellular networks.

[1]  David F. Heidel,et al.  An Overview of the BlueGene/L Supercomputer , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[2]  Ladislau Bölöni,et al.  A comparison study of static mapping heuristics for a class of meta-tasks on heterogeneous computing systems , 1999, Proceedings. Eighth Heterogeneous Computing Workshop (HCW'99).

[3]  David Fernández-Baca,et al.  Allocating Modules to Processors in a Distributed System , 1989, IEEE Trans. Software Eng..

[4]  Shahid H. Bokhari,et al.  On the Mapping Problem , 1981, IEEE Transactions on Computers.

[5]  Tevfik Bultan,et al.  A New Mapping Heuristic Based on Mean Field Annealing , 1992, J. Parallel Distributed Comput..

[6]  William Gropp,et al.  MPI on BlueGene/L: Designing an Efficient General Purpose Messaging Solution for a Large Cellular System , 2003, PVM/MPI.

[7]  Marcus Dormanns,et al.  Partitioning and mapping of parallel programs by self-organization , 1996, Concurr. Pract. Exp..

[8]  D. Janaki Ram,et al.  Parallel Simulated Annealing Algorithms , 1996, J. Parallel Distributed Comput..

[9]  Brian E. Smith,et al.  Performance Effects of Node Mappings on the IBM BlueGene/L Machine , 2005, Euro-Par.

[10]  Alan Gara,et al.  HARDWARE AND SOFTWARE STATUS OF QCDOC. , 2004 .

[11]  Howard Jay Siegel,et al.  Techniques for mapping tasks to machines in heterogeneous computing systems , 2000, J. Syst. Archit..

[12]  K. Kennedy,et al.  Automatic Data Layout for High Performance Fortran , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[13]  Thomas L. Sterling,et al.  BEOWULF: A Parallel Workstation for Scientific Computation , 1995, ICPP.

[14]  Alain Billionnet,et al.  An efficient algorithm for a task allocation problem , 1992, JACM.

[15]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[16]  Kang G. Shin,et al.  Assignment and Scheduling Communicating Periodic Tasks in Distributed Real-Time Systems , 1997, IEEE Trans. Software Eng..

[17]  Yaghout Nourani,et al.  A comparison of simulated annealing cooling strategies , 1998 .

[18]  T. Wettig,et al.  The QCDOC supercomputer: hardware, software, and performance , 2003, hep-lat/0306023.

[19]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Daniel R. Greening,et al.  Parallel simulated annealing techniques , 1990 .

[21]  Michael G. Norman,et al.  Models of machines and computation for mapping in multicomputers , 1993, CSUR.

[22]  R. H. J. M. Otten,et al.  The Annealing Algorithm , 1989 .

[23]  Bronis R. de Supinski,et al.  Scaling physics and material science applications on a massively parallel Blue Gene/L system , 2005, ICS '05.

[24]  Ibm Redbooks IBM System Blue Gene Solution: Blue Gene/P Application Development , 2009 .

[25]  Hans-Ulrich Heiss,et al.  Mapping Tasks to Processors at Run-time , 1992 .

[26]  Philip Heidelberger,et al.  Blue Gene/L torus interconnection network , 2005, IBM J. Res. Dev..

[27]  E. F. Gehringer,et al.  A graph-oriented mapping strategy for a hypercube , 1988, C3P.

[28]  Anoop Gupta,et al.  Parallel computer architecture - a hardware / software approach , 1998 .

[29]  Philip Heidelberger,et al.  Optimizing task layout on the Blue Gene/L supercomputer , 2005, IBM J. Res. Dev..

[30]  Laxmikant V. Kalé,et al.  Topology-aware task mapping for reducing communication contention on large parallel machines , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[31]  Kang G. Shin,et al.  Period-Based Load Partitioning and Assignment for Large Real-Time Applications , 2000, IEEE Trans. Computers.

[32]  EvolutionaryStrategiesAjay K. Gupta Static Task Allocation Using (;) Evolutionary Strategies , 1996 .

[33]  David M. Nicol,et al.  Rectilinear Partitioning of Irregular Data Parallel Computations , 1994, J. Parallel Distributed Comput..

[34]  Jesper Larsson Träff Implementing the MPI process topology mechanism , 2002, SC '02.

[35]  Harold S. Stone,et al.  Multiprocessor Scheduling with the Aid of Network Flow Algorithms , 1977, IEEE Transactions on Software Engineering.

[36]  B. Anderson Finite-Time Thermodynamics and Simulated Annealing , 1996 .

[37]  Garrison W. Greenwood,et al.  Scheduling tasks in multiprocessor systems using evolutionary strategies , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[38]  Panos M. Pardalos,et al.  The Quadratic Assignment Problem: A Survey and Recent Developments , 1993, Quadratic Assignment and Related Problems.

[39]  Michael J. Flynn,et al.  Some Computer Organizations and Their Effectiveness , 1972, IEEE Transactions on Computers.

[40]  M. Affenzeller,et al.  Generic Heuristics for Combinatorial Optimization Problems , 2002 .

[41]  P. Sadayappan,et al.  Task allocation onto a hypercube by recursive mincut bipartitioning , 1988, C3P.

[42]  Hans-Ulrich Heiß,et al.  MAPPING TASKS TO PROCESSORS WITH THE AID OF KOHONEN NETWORKS , 2007 .

[43]  Viktor K. Prasanna,et al.  Heterogeneous computing: challenges and opportunities , 1993, Computer.

[44]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[45]  Robert D. Falgout,et al.  Semicoarsening Multigrid on Distributed Memory Machines , 1999, SIAM J. Sci. Comput..

[46]  Mark Fleischer Simulated annealing: past, present, and future , 1995, WSC '95.

[47]  Peter S. Pacheco Parallel programming with MPI , 1996 .

[48]  Jeff T. Linderoth,et al.  Solving large quadratic assignment problems on computational grids , 2002, Math. Program..

[49]  Laxmikant V. Kale,et al.  The Charm Parallel Programming Language and System: Part I - Description of Language Features , 1994 .

[50]  José E. Moreira,et al.  Blue Gene/L performance tools , 2005, IBM J. Res. Dev..

[51]  Erick Cantú-Paz,et al.  A Survey of Parallel Genetic Algorithms , 2000 .

[52]  Bipin Indurkhya,et al.  Optimal partitioning of randomly generated distributed programs , 1986, IEEE Transactions on Software Engineering.

[53]  Michael Pinedo,et al.  Scheduling: Theory, Algorithms, and Systems , 1994 .

[54]  Lalit M. Patnaik,et al.  Genetic algorithms: a survey , 1994, Computer.

[55]  Mor Harchol-Balter,et al.  Task assignment in a distributed system (extended abstract): improving performance by unbalancing load , 1997, SIGMETRICS '98/PERFORMANCE '98.

[56]  Hironori Kasahara,et al.  Practical Multiprocessor Scheduling Algorithms for Efficient Parallel Processing , 1984, IEEE Transactions on Computers.

[57]  Yuefan Deng,et al.  The performance of a supercomputer built with commodity components , 2001, Parallel Comput..

[58]  Traian Muntean,et al.  General heuristics for the mapping problem , 1993 .

[59]  Hao Yu,et al.  A study of MPI performance analysis tools on Blue Gene/L , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[60]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[61]  José E. Moreira,et al.  Topology Mapping for Blue Gene/L Supercomputer , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[62]  Roberto Vaccaro,et al.  Improving search by incorporating evolution principles in parallel Tabu Search , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.