Horizontally microprogrammable CPUs belong to a class of machines having statically schedulable parallel instruction execution (SPIE machines). Several experiments have shown that within basic blocks, real code only gives a potential speed-up factor of 2 or 3 when compacted for SPIE machines, even in the presence of unlimited hardware. In this paper, similar experiments are described. However, these measure the potential parallelism available using any global compaction method, that is, one which compacts code beyond block boundaries. Global compaction is a subject of current investigation; no measurements yet exist on implemented systems.
The approach taken is to first assume that an oracle is available during compaction. This oracle can resolve all dynamic considerations in advance, giving us the ability to find the maximum parallelism available without reformulation of the algorithm. The parallelism found is constrained only by legitimate data dependencies, since questions of conditional jump directions and unresolved indirect memory references are answered by the oracle. Using such an oracle, we find that typical scientific programs may be sped up by anywhere from 3 to 1000 times. These dramatic results provide an upper bound for global compaction techniques. We describe experiments in progress which attempt to limit the oracle progressively, with the aim of eventually producing one which provides only information that may be obtained by a very good compiler. This will give us a more practical measure of the parallelism potentially obtainable via global compaction methods.
[1]
Utpal Banerjee,et al.
Time and Parallel Processor Bounds for Fortran-Like Loops
,
1979,
IEEE Transactions on Computers.
[2]
Bruce D. Shriver,et al.
Microcode compaction: looking backward and looking forward
,
1981,
AFIPS '81.
[3]
Yoichi Muraoka,et al.
On the Number of Operations Simultaneously Executable in Fortran-Like Programs and Their Resulting Speedup
,
1972,
IEEE Transactions on Computers.
[4]
David J. Kuck,et al.
Time and Parallel Processor Bounds for Linear Recurrence Systems
,
1975,
IEEE Transactions on Computers.
[5]
Alfred V. Aho,et al.
Principles of Compiler Design (Addison-Wesley series in computer science and information processing)
,
1977
.
[6]
Joseph Allen Fisher,et al.
The Optimization of Horizontal Microcode within and Beyond Basic Blocks: an Application of Processor Scheduling with Resources
,
2018
.
[7]
Edward M. Riseman,et al.
The Inhibition of Potential Parallelism by Conditional Jumps
,
1972,
IEEE Transactions on Computers.
[8]
Michael J. Flynn,et al.
Detection and Parallel Execution of Independent Instructions
,
1970,
IEEE Transactions on Computers.
[9]
Joseph A. Fisher,et al.
Trace Scheduling: A Technique for Global Microcode Compaction
,
1981,
IEEE Transactions on Computers.
[10]
Alfred V. Aho,et al.
Principles of Compiler Design
,
1977
.
[11]
Leslie Lamport,et al.
The parallel execution of DO loops
,
1974,
CACM.