Highly concurrent scalar processing

High speed scalar processing is an essential characteristic of high performance general purpose computer systems. Highly concurrent execution of scalar code is difficult due to data dependencies and conditional branches. This paper proposes an architectural concept called guarded instructions to reduce the penalty of conditional branches in deeply pipelined processors. A code generation heuristic, the decision tree scheduling technique, reorders instructions in a complex of basic blocks so as to make efficient use of guarded instructions. Performance evaluation of several benchmarks are presented, including a module from the UNIX kernel. Even with these difficult scalar code examples, a speedup of two is achievable by using conventional pipelined uniprocessors augmented by guard instructions, and a speedup of three or more can be achieved using processors with parallel instruction pipelines.

[1]  Jr. Edward Willmore Davis,et al.  A multiprocessor for simulation applications. , 1972 .

[2]  Norman P. Jouppi,et al.  Hardware/software tradeoffs for increased performance , 1982, ASPLOS I.

[3]  J. F. Thorlin Code generation for PIE (Parallel Instruction Execution) computers , 1967, AFIPS '67 (Spring).

[4]  James E. Smith,et al.  Instruction Issue Logic in Pipelined Supercomputers , 1984, IEEE Trans. Computers.

[5]  Henry M. Levy,et al.  Measurement and analysis of instruction use in the VAX-11/780 , 1982, ISCA 1982.

[6]  Edward G. Coffman,et al.  Computer and job-shop scheduling theory , 1976 .

[7]  David W. Anderson,et al.  The IBM System/360 model 91: machine philosophy and instruction-handling , 1967 .

[8]  George Radin,et al.  The 801 minicomputer , 1982, ASPLOS I.

[9]  Hironori Kasahara,et al.  Practical Multiprocessor Scheduling Algorithms for Efficient Parallel Processing , 1984, IEEE Transactions on Computers.

[10]  Peter M. Kogge,et al.  The Architecture of Pipelined Computers , 1981 .

[11]  David L. Kuck,et al.  The Structure of Computers and Computations , 1978 .

[12]  Edward M. Riseman,et al.  The Inhibition of Potential Parallelism by Conditional Jumps , 1972, IEEE Transactions on Computers.

[13]  JOHN L. HENNESSY,et al.  VLSI Processor Architecture , 1984, IEEE Transactions on Computers.

[14]  Thomas R. Gross,et al.  Optimizing delayed branches , 1982, MICRO 15.

[15]  Edsger W. Dijkstra,et al.  Guarded commands, nondeterminacy and formal derivation of programs , 1975, Commun. ACM.

[16]  Richard M. Russell,et al.  The CRAY-1 computer system , 1978, CACM.

[17]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[18]  Norman P. Jouppi,et al.  MIPS: A microprocessor architecture , 1982, MICRO 15.