Palm: an integrated parallelism enhancement environment with static-dynamic scheduling

While everyone agrees that algorithm, compiler, and architecture should operate hand in hand to produce the most efficient parallel code, a unified research effort leading to an environment that a user can quickly use to map a 'dusty deck' as well as software in newer languages to an efficient code for a variety of commercial compilers and architectures, has been lacking. The authors have undertaken an integrated approach leading to an environment that may be used for both important classes of architectures: shared memory, and private memory MIMD machines in a language independent manner. Various utilities permit measurement of potential parallelism in an algorithmic step, perform source code modification to assist the compiler in utilizing the embedded parallelism, and tune the code to a specific architecture. The Static Dynamic Scheduler, which is a part of the environment estimates the processor requirements of basic blocks of given program and allocates the processors partially at compile time and partially at run time, to obtain a good tradeoff between speedup and utilization.<<ETX>>

[1]  Dharma P. Agrawal,et al.  Computational models and resource allocation for supercomputers , 1989 .

[2]  Dharma P. Agrawal,et al.  Modeling Techniques in a Parallelizing Compiler for the B-Hive Multiprocessor System , 1989, Int. J. High Speed Comput..

[3]  Hironori Kasahara,et al.  Practical Multiprocessor Scheduling Algorithms for Efficient Parallel Processing , 1984, IEEE Transactions on Computers.

[4]  D. Sorensen,et al.  A pipelined givens method for computing the QR factorization of a sparse matrix , 1986 .

[5]  David A. Padua,et al.  Execution of Parallel Loops on Parallel Processor Systems , 1986, ICPP.

[6]  Dharma P. Agrawal,et al.  On Control Flow and Pseudo-Static Dynamic Allocation Strategy , 1990, ICPP.

[7]  Edward F. Gehringer,et al.  B-HIVE: hardware and software for an experimental multiprocessor , 1990, Twenty-Third Annual Hawaii International Conference on System Sciences.

[8]  Jack J. Dongarra,et al.  Programming methodology and performance issues for advanced computer architectures , 1988, Parallel Comput..

[9]  Constantine D. Polychronopoulos,et al.  Processor Allocation for Horizontal and Vertical Parallelism and Related Speedup Bounds , 1987, IEEE Transactions on Computers.

[10]  Dharma P. Agrawal,et al.  Structure of a parallelizing compiler for the B-HIVE multicomputer☆ , 1988 .

[11]  Dharma P. Agrawal,et al.  Modeling of parallel software for efficient computation communication overlap , 1987, FJCC.

[12]  Dharma P. Agrawal,et al.  Task Division and Multicomputer Systems , 1985, ICDCS.

[13]  Anne Rogers,et al.  Process decomposition through locality of reference , 1989, PLDI '89.

[14]  Wayne R. Cowell,et al.  Tools to aid in discovering parallelism and localizing arithmetic in Fortran programs , 1990, Softw. Pract. Exp..

[15]  Dianne P. O'Leary,et al.  Data-flow algorithms for parallel matrix computation , 1985, CACM.

[16]  CONSTANTINE D. POLYCHRONOPOULOS,et al.  Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers , 1987, IEEE Transactions on Computers.

[17]  Edward G. Coffman,et al.  An Application of Bin-Packing to Multiprocessor Scheduling , 1978, SIAM J. Comput..

[18]  Constantine Demetrios Polychronopoulos On program restructuring, scheduling, and communication for parallel processor systems , 1986 .

[19]  Sukil Kim,et al.  Least-Squares Multiple Updating Algorithms on a Hypercube , 1990, J. Parallel Distributed Comput..

[20]  E.L. Lawler,et al.  Optimization and Approximation in Deterministic Sequencing and Scheduling: a Survey , 1977 .