Processor Allocation for Horizontal and Vertical Parallelism and Related Speedup Bounds

The main aim of the paper is to study allocation of processors to, parallel programs executing on a multiprocessor system, and the resulting speedups. First, we consider a parallel program as a sequence of steps where each step consists of a set of parallel operations. General bounds on the speedup on a p- processor system are derived based on this model. Measurements of code parallelism for the, LINPACK numerical package are presented to support the belief that typical numerical programs contain much potential parallelism that can be discovered by a good restructuring compiler. Next, a parallel program is represented as a task graph whose nodes are do across loops (i.e., loops whose iterations can be partially, overlapped). It is shown how processors can be allocated to exploit horizontal and vertical parallelism in such graphs. Two processor allocation heuristic algorithms (WP and PA) are presented. PA is the heart of the WP and is used to obtain efficient processor allocations for a set of independent parallel tasks. WP allocates processors to general task graphs. Finally, a general formula for the speedup of a DO across loop is given that is more accurate than the known formula.

[1]  Constantine Demetrios Polychronopoulos On program restructuring, scheduling, and communication for parallel processor systems , 1986 .

[2]  Augustus Kinzel Uht Hardware extraction of low-level concurrency from sequential instruction streams (parallelism, implementation, architecture, dependencies, semantics) , 1985 .

[3]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[4]  Michael J. Flynn,et al.  Parallelism and Representation Problems in Distributed Systems , 1980, IEEE Transactions on Computers.

[5]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[6]  Edward G. Coffman,et al.  Computer and job-shop scheduling theory , 1976 .

[7]  Alexander V. Veidenbaum Compiler optimizations and architecture design issues for multiprocessors (parallel) , 1985 .

[8]  David A. Padua,et al.  Execution of Parallel Loops on Parallel Processor Systems , 1986, ICPP.

[9]  Hironori Kasahara,et al.  Practical Multiprocessor Scheduling Algorithms for Efficient Parallel Processing , 1984, IEEE Transactions on Computers.

[10]  Alexander V. Veidenbaum,et al.  EFFECTS OF PROGRAM RESTRUCTURING, ALGORITHM CHANGE, AND ARCHITECTURE CHOICE ON PROGRAM PERFORMANCE. , 1984 .

[11]  Harold S. Stone,et al.  Multiprocessor Scheduling with the Aid of Network Flow Algorithms , 1977, IEEE Transactions on Software Engineering.

[12]  Augustus K. Uht,et al.  Hardware Extraction of Low-Level Concurrency from Serial Instruction Streams , 1986, ICPP.

[13]  Edward G. Coffman,et al.  An Application of Bin-Packing to Multiprocessor Scheduling , 1978, SIAM J. Comput..

[14]  Ron Cytron Useful Parallelism in a Multiprocessing Environment , 1985, ICPP.

[15]  Utpal Banerjee,et al.  Speedup of ordinary programs , 1979 .

[16]  Yoichi Muraoka,et al.  Measurements of parallelism in ordinary FORTRAN programs , 1974, Computer.

[17]  David L. Kuck,et al.  The Structure of Computers and Computations , 1978 .

[18]  Constantine D. Polychronopoulos,et al.  Speedup Bounds and Processor Allocation for Parallel Programs on Multiprocessors , 1986, ICPP.

[19]  Ronald Gary Cytron Compile-time scheduling and optimization for asynchronous machines (multiprocessor, compiler, parallel processing) , 1984 .

[20]  Alexandru Nicolau,et al.  Measuring the Parallelism Available for Very Long Instruction Word Architectures , 1984, IEEE Transactions on Computers.