ARB: A Hardware Mechanism for Dynamic Reordering of Memory References

To exploit instruction level parallelism, it is important not only to execute multiple memory references per cycle, but also to reorder memory references-especially to execute loads before stores that precede them in the sequential instruction stream. To guarantee correctness of execution in such situations, memory reference addresses have to be disambiguated. This paper presents a novel hardware mechanism, called an Address Resolution Buffer (ARB), for performing dynamic reordering of memory references. The ARB supports the following features: (1) dynamic memory disambiguation in a decentralized manner, (2) multiple memory references per cycle, (3) out-of-order execution of memory references, (4) unresolved loads and stores, (5) speculative loads and stores, and (6) memory renaming. The paper presents the results of a simulation study that we conducted to verify the efficacy of the ARB for a superscalar processor. The paper also shows the ARB's application in a multiscalar processor.

[1]  Yale N. Patt,et al.  Critical issues regarding HPS, a high performance microarchitecture , 1985, MICRO 18.

[2]  James E. Smith,et al.  Dynamic instruction scheduling and the Astronautics ZS-1 , 1989, Computer.

[3]  Kemal Ebcioglu,et al.  An architectural framework for migration from CISC to higher performance platforms , 1992, ICS '92.

[4]  David W. Anderson,et al.  The IBM System/360 model 91: machine philosophy and instruction-handling , 1967 .

[5]  Gurindar S. Sohi,et al.  The Expandable Split Window Paradigm for Exploiting Fine-grain Parallelism , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[6]  Todd M. Austin,et al.  Dynamic dependency analysis of ordinary programs , 1992, ISCA '92.

[7]  Wen-Mei William Hwu,et al.  Hpsm: exploiting concurrency to achieve high performance in a single-chip microarchitecture , 1987 .

[8]  L. J. Boland,et al.  The IBM system/360 model 91: storage system , 1967 .

[9]  Manoj Franklin,et al.  The multiscalar architecture , 1993 .

[10]  Andrew R. Pleszkun,et al.  Implementing Precise Interrupts in Pipelined Processors , 1988, IEEE Trans. Computers.

[11]  Lawrence Rauchwerger,et al.  Measuring limits of parallelism and characterizing its vulnerability to resource constraints , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.

[12]  G. Sohi,et al.  Control flow prediction for dynamic ILP processors , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.

[13]  Gurindar S. Sohi,et al.  The expandable split window paradigm for exploiting fine-grain parallelsim , 1992, ISCA '92.

[14]  Gerry Kane,et al.  MIPS R2000 RISC architecture , 1987 .

[15]  John R. Ellis,et al.  Bulldog: A Compiler for VLIW Architectures , 1986 .

[16]  Yale N. Patt,et al.  Retrospective: alternative implementations of two-level adaptive training branch prediction , 1998, ISCA '98.

[17]  Yale N. Patt,et al.  Alternative implementations of two-level adaptive branch prediction , 1992, ISCA '92.

[18]  Mary Lou Soffa,et al.  Architectural support for register allocation in the presence of aliasing , 1990, Proceedings SUPERCOMPUTING '90.

[19]  Alexandru Nicolau,et al.  Run-Time Disambiguation: Coping with Statically Unpredictable Dependencies , 1989, IEEE Trans. Computers.

[20]  S SohiGurindar Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers , 1990 .