Automatic generation of parallel programs with dynamic load balancing

Existing parallelizing compilers are targeted towards parallel architectures where all processors are dedicated to a single application. However a new type of parallel system has become available in the form of high performance workstations connected by high speed networks. Such systems pose new problems for compilers because the available processing power on each workstation may change with time due to other tasks competing for resources. We argue that it is possible for a parallelizing compiler to generate code that can dynamically shift portions of the application's workload between processors to improve performance. We have implemented a run-time system that supports automatically generated programs with dynamic load balancing. We describe this system and present performance measurements. We also describe the compiler functionality needed to generate parallel programs with dynamic load balancing.<<ETX>>

[1]  Evangelos P. Markatos,et al.  Using processor affinity in loop scheduling on shared-memory multiprocessors , 1992, Proceedings Supercomputing '92.

[2]  Ken Kennedy,et al.  Compiling Fortran D for MIMD distributed-memory machines , 1992, CACM.

[3]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[4]  Ken Kennedy,et al.  Evaluation of compiler optimizations for Fortran D on MIMD distributed memory machines , 1992, ICS '92.

[5]  Peter Steenkiste A systematic approach to host interface design for high-speed networks , 1994, Computer.

[6]  Anthony P. Reeves,et al.  A Distributed Dynamic Load Balancing Strategy for Highly Parallel Multicomputer Systems , 1989, PPSC.

[7]  Pinaki Mazumder,et al.  Wolverines: standard cell placement on a network of workstations , 1992, EURO-DAC '92.

[8]  Ping-Sheng Tseng Compiling programs for a linear systolic array , 1990, PLDI '90.

[9]  Peter Brezany,et al.  Vienna Fortran - A Language Specification. Version 1.1 , 1992 .

[10]  F. Bitz,et al.  Host interface design for ATM LANs , 1991, [1991] Proceedings 16th Conference on Local Computer Networks.

[11]  Maurice J. Bach The Design of the UNIX Operating System , 1986 .

[12]  L.M. Ni,et al.  Trapezoid Self-Scheduling: A Practical Scheduling Scheme for Parallel Compilers , 1993, IEEE Trans. Parallel Distributed Syst..

[13]  Reinhard von Hanxleden,et al.  Load Balancing on Message Passing Architectures , 1991, J. Parallel Distributed Comput..

[14]  Philip J. Hatcher,et al.  Data-parallel programming on multicomputers , 1990, IEEE Software.

[15]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.

[16]  Ken Kennedy,et al.  Computer support for machine-independent parallel programming in Fortran D , 1992 .

[17]  CONSTANTINE D. POLYCHRONOPOULOS,et al.  Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers , 1987, IEEE Transactions on Computers.

[18]  Adam Kolawa,et al.  Express is not just a Message Passing System Current and Future Directions in Express , 1994, Parallel Comput..

[19]  William H. Press,et al.  Numerical Recipes in FORTRAN - The Art of Scientific Computing, 2nd Edition , 1987 .

[20]  H. T. Kung,et al.  A Host Interface Architecture for High-Speed Networks , 1992, HPN.

[21]  C.-C. Jay Kuo,et al.  Two-Color Fourier Analysis of Iterative Algorithms for Elliptic Problems with Red/Black Ordering , 1990, SIAM J. Sci. Comput..

[22]  Edith Schonberg,et al.  Factoring: a practical and robust method for scheduling parallel loops , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[23]  Shekhar Y. Borkar,et al.  iWarp: an integrated solution to high-speed parallel computing , 1988, Proceedings. SUPERCOMPUTING '88.

[24]  Ken Kennedy,et al.  Optimizing for parallelism and data locality , 1992 .

[25]  H. T. Kung,et al.  The design of nectar: a network backplane for heterogeneous multicomputers , 1989, ASPLOS III.

[26]  Peiyi Tang,et al.  Reducing data communication overhead for DOACROSS loop nests , 1994, ICS '94.

[27]  Evangelos P. Markatos,et al.  Using processor affinity in loop scheduling on shared-memory multiprocessors , 1992, Supercomputing '92.

[28]  Christopher Giertsen,et al.  Parallel volume rendering on a network of workstations , 1993, IEEE Computer Graphics and Applications.

[29]  Jack Dongarra,et al.  LINPACK Users' Guide , 1987 .

[30]  Ken Kennedy,et al.  A static performance estimator to guide data partitioning decisions , 1991, PPOPP '91.

[31]  Steven Lucco,et al.  A dynamic scheduling method for irregular parallel programs , 1992, PLDI '92.

[32]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[33]  Peter Steenkiste,et al.  Aroma: language support for distributed objects , 1992, Proceedings Sixth International Parallel Processing Symposium.

[34]  Barbara M. Chapman,et al.  Handling Distributed Data in Vienna Fortran Procedures , 1992, LCPC.

[35]  Allan L. Fisher,et al.  Parallelizing complex scans and reductions , 1994, PLDI '94.

[36]  Robert P. Weaver,et al.  The DINO Parallel Programming Language , 1991, J. Parallel Distributed Comput..

[37]  Paul Feautrier,et al.  Detection of Recurrences in Sequential Programs with Loops , 1993, PARLE.

[38]  Thomas R. Gross,et al.  Programming Task and Data Parallelism on a Multicomputer. , 1993, PPoPP 1993.

[39]  P Pieter Struik Techniques for designing efficient parallel programs , 1991 .

[40]  M. Wolfe,et al.  Massive parallelism through program restructuring , 1990, [1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation.

[41]  H. T. Kung,et al.  The design of nectar: a network backplane for heterogeneous multicomputers , 1989, ASPLOS 1989.

[42]  David A. Padua,et al.  High-Speed Multiprocessors and Compilation Techniques , 1980, IEEE Transactions on Computers.

[43]  Philip J. Hatcher,et al.  Data-Parallel Programming on MIMD Computers , 1991, IEEE Trans. Parallel Distributed Syst..

[44]  Robert M. Keller,et al.  The Gradient Model Load Balancing Method , 1987, IEEE Transactions on Software Engineering.

[45]  P.-S. Tseng A parallelizing compiler for distributed memory parallel computers , 1989, PLDI 1989.

[46]  Michael J. Quinn,et al.  Data-parallel programming on a network of heterogeneous workstations , 1992, Proceedings of the First International Symposium on High-Performance Distributed Computing. (HPDC-1).

[47]  Ken Kennedy,et al.  Evaluating Compiler Optimizations for Fortran D , 1994, J. Parallel Distributed Comput..

[48]  Ken Kennedy,et al.  Automatic loop interchange , 2004, SIGP.

[49]  Yung-Terng Wang,et al.  Load Sharing in Distributed Systems , 1985, IEEE Transactions on Computers.

[50]  D. Callahan,et al.  Recognizing and Parallelizing Bounded Recurrences , 1991, LCPC.

[51]  Michael Wolfe,et al.  More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[52]  Ken Kennedy,et al.  An Overview of the Fortran D Programming System , 1991, LCPC.

[53]  Vaidy S. Sunderam,et al.  PVM: A Framework for Parallel Distributed Computing , 1990, Concurr. Pract. Exp..

[54]  Jack Dongarra,et al.  Automatic Blocking of Nested Loops , 1990 .

[55]  David B. Loveman,et al.  Program Improvement by Source-to-Source Transformation , 1977, J. ACM.

[56]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[57]  Hui Li,et al.  Locality and Loop Scheduling on NUMA Multiprocessors , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[58]  K. K. Ramakrishnan,et al.  Performance Considerations in Designing Network Interfaces , 1993, IEEE J. Sel. Areas Commun..

[59]  Thomas L. Casavant,et al.  A Taxonomy of Scheduling in General-Purpose Distributed Computing Systems , 1988, IEEE Trans. Software Eng..