On the combination of hardware and software concurrency extraction methods

It has been shown that parallelism is a very promising alternative for enhancing computer performance. Parallelism, however, introduces much complexity to the programming effort. This has lead to the development of automatic concurrency extraction techniques. Prior work has demonstrated that static program restructuring via compiler based techniques provides a large degree of parallelism to the target machine. Purely hardware based extraction techniques (without software preprocessing) have also demonstrated significant (but lesser) degrees of parallelism. This paper considers the performance effects of the combination of both hardware and software techniques. The concurrency extracted from a given set of benchmarks by each technique separately, and together, is determined via simulations and/or analysis. The “common parallelism” extracted by the two methods is thus also considered, using new metrics. The analytic techniques for predicting the performance of specific programs are also described.

[1]  David A. Padua,et al.  Utilizing Multidimensional Loop Parallelism on Large-Scale Parallel Processor Systems , 1989, IEEE Trans. Computers.

[2]  Yale N. Patt,et al.  HPS, a new microarchitecture: rationale and introduction , 1985, MICRO 18.

[3]  Utpal Banerjee,et al.  Speedup of ordinary programs , 1979 .

[4]  Robert G. Wedig Detection of concurrency in directly executed language instruction streams , 1982 .

[5]  Augustus K. Uht Incremental Performance Contributions of Hardware Concurrency Extraction Techniques , 1987, ICS.

[6]  Garold Stephen Tjaden Representation and detection of concurrency using ordering-matrices. , 1972 .

[7]  Michael J. Flynn,et al.  Representation of Concurrency with Ordering Matrices , 1973, IEEE Transactions on Computers.

[8]  Constantine D. Polychronopoulos,et al.  Processor Allocation for Horizontal and Vertical Parallelism and Related Speedup Bounds , 1987, IEEE Transactions on Computers.

[9]  Yoichi Muraoka,et al.  On the Number of Operations Simultaneously Executable in Fortran-Like Programs and Their Resulting Speedup , 1972, IEEE Transactions on Computers.

[10]  James E. Thomton,et al.  Parallel Operation in the Control Data 6600 , 1899 .

[11]  CONSTANTINE D. POLYCHRONOPOULOS,et al.  Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers , 1987, IEEE Transactions on Computers.

[12]  Constantine Demetrios Polychronopoulos On program restructuring, scheduling, and communication for parallel processor systems , 1986 .

[13]  Augustus Kinzel Uht Hardware extraction of low-level concurrency from sequential instruction streams (parallelism, implementation, architecture, dependencies, semantics) , 1985 .

[14]  Augustus K. Uht,et al.  Hardware Extraction of Low-Level Concurrency from Serial Instruction Streams , 1986, ICPP.

[15]  Donald D. Chamberlin The "single-assignment" approach to parallel processing , 1972, AFIPS '71 (Fall).

[16]  R. M. Tomasulo,et al.  An efficient algorithm for exploiting multiple arithmetic units , 1995 .

[17]  David J. Kuck,et al.  A Survey of Parallel Machine Organization and Programming , 1977, CSUR.

[18]  Kevin W. Bowyer Book review of The structure of computers and computations: volume one by David J. Kuck. John Wiley & Sons 1978. , 1979, CARN.

[19]  J. E. Thornton,et al.  Parallel operation in the control data 6600 , 1964, AFIPS '64 (Fall, part II).

[20]  Hwa C. Torng,et al.  An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors , 1986, IEEE Transactions on Computers.

[21]  David L. Kuck,et al.  The Structure of Computers and Computations , 1978 .

[22]  Robert M. Keller,et al.  Look-Ahead Processors , 1975, CSUR.