Improving instruction-level parallelism by loop unrolling and dynamic memory disambiguation

Exploitation of instruction-level parallelism is an effective mechanism for improving the performance of modern super-scalar/VLIW processors. Various software techniques can be applied to increase instruction-level parallelism. This paper describes and evaluates a software technique, dynamic memory disambiguation, that permits loops containing loads and stores to be scheduled more aggressively, thereby exposing more instruction-level parallelism. The results of our evaluation show that when dynamic memory disambiguation is applied in conjunction with loop unrolling, register renaming, and static memory disambiguation, the ILP of memory-intensive benchmarks can be increased by as much as 300 percent over loops where dynamic memory disambiguation is not performed. Our measurements also indicate that for the programs that benefit the most from these optimizations, the register usage does not exceed the number of registers on mast high-performance processors.

[1]  Scott A. Mahlke,et al.  Reverse If-Conversion , 1993, PLDI '93.

[2]  B. Ramakrishna Rau,et al.  The Cydra 5 departmental supercomputer: design philosophies, decisions, and trade-offs , 1989, Computer.

[3]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[4]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[5]  Richard A. Huff,et al.  Lifetime-sensitive modulo scheduling , 1993, PLDI '93.

[6]  Scott A. Mahlke,et al.  Compiler code transformations for superscalar-based high-performance systems , 1992, Proceedings Supercomputing '92.

[7]  David Bernstein,et al.  Dynamic memory disambiguation for array references , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[8]  Jack W. Davidson,et al.  Memory access coalescing: a technique for eliminating redundant memory accesses , 1994, PLDI '94.

[9]  B. Ramakrishna Rau,et al.  Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing , 1981, MICRO 14.

[10]  Manuel E. Benitez Register allocation and phase interactions in retargetable optimizing compilers , 1994 .

[11]  John Paul Shen,et al.  Speculative disambiguation: a compilation technique for dynamic memory disambiguation , 1994, ISCA '94.

[12]  David W. Wall,et al.  Limits of instruction-level parallelism , 1991, ASPLOS IV.

[13]  F. H. Mcmahon,et al.  The Livermore Fortran Kernels: A Computer Test of the Numerical Performance Range , 1986 .

[14]  David B. Whalley,et al.  Ease: an environment for architecture study and experimentation , 1990, SIGMETRICS '90.

[15]  Scott A. Mahlke,et al.  Superblock formation using static program analysis , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.

[16]  B. Ramakrishna Rau,et al.  Register allocation for software pipelined loops , 1992, PLDI '92.

[17]  David A. Padua,et al.  Dependence graphs and compiler optimizations , 1981, POPL '81.

[18]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[19]  Bruce R. Childers,et al.  Memory bandwidth optimizations for wide-bus machines , 1993, [1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences.

[20]  Jian Wang,et al.  GURPR*: a new global software pipelining algorithm , 1991, MICRO 24.

[21]  Vicki H. Allan,et al.  Software pipelining: an evaluation of enhanced pipelining , 1991, MICRO 24.

[22]  Wen-mei W. Hwu,et al.  The benefit of predicated execution for software pipelining , 1993, [1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences.

[23]  Christos A. Papachristou,et al.  A VLIW architecture based on shifting register files , 1993, MICRO 1993.

[24]  Mike Schlansker,et al.  Parallelization of loops with exits on pipelined architectures , 1990, Proceedings SUPERCOMPUTING '90.

[25]  Shlomo Weiss,et al.  A study of scalar compilation techniques for pipelined supercomputers , 1987, ASPLOS 1987.

[26]  Gerry Kane,et al.  MIPS RISC Architecture , 1987 .

[27]  Manuel E. Benitez,et al.  A portable global optimizer and linker , 1988, PLDI '88.