Eliminating the address translation bottleneck for physical address cache

Two architectural techniques are presented and analyzed in this paper that aim at eliminating the Translation Lc&aside Buffer (TLB) access delay from the critical path of physical address cache-based scalar processors. The first technique, parallel address translation, masks the TLB access delay by using a set-associative virtual memory map to extend the cache size beyond the product of the cache associativity and the virtual memory page size. The second teehnique, lazy address translation, bypasses the TLB access completely by using the base register and offset in a memory reference as a caching mechanism for its corresponding physical page. Consequently the TLB access is needed only when this caching scheme fails. A trace-driven simulation study is conducted and the experimental results show that under the given workload the padel address translation scheme works best when the virtual memory is 16way set associative, and the penalty on the average cycle-per-instruction (CPI) due to lazy address translation is less than 1.3%.

[1]  David A. Wood,et al.  Design and Evaluation of In-Cache Address Translation , 1990 .

[2]  N. P. Jouppi Architectural and organizational tradeoffs in the design of the MultiTitan CPU , 1989, ISCA '89.

[3]  R. H. Katz,et al.  Supporting reference and dirty bits in SPUR's virtual address cache , 1989, ISCA '89.

[4]  Richard Eugene Kessler Analysis of multi-megabyte secondary CPU cache memories , 1992 .

[5]  Peter Davies,et al.  The TLB slice—a low-cost high-speed address translation mechanism , 1990, ISCA '90.

[6]  Andrew W. Appel,et al.  Virtual memory primitives for user programs , 1991, ASPLOS IV.

[7]  Alan Jay Smith,et al.  A Comparative Study of Set Associative Memory Mapping Algorithms and Their Use for Cache and Main Memory , 1978, IEEE Transactions on Software Engineering.

[8]  L. Liu,et al.  Early resolution of address translation in cache design , 1990, Proceedings., 1990 IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[9]  Mark D. Hill,et al.  Aspects of Cache Memory and Instruction , 1987 .

[10]  Henry M. Levy,et al.  Computer Programming and Architecture: The VAX-11 , 1980 .

[11]  Gerry Kane,et al.  MIPS RISC Architecture , 1987 .

[12]  TaylorGeorge,et al.  The TLB slicea low-cost high-speed address translation mechanism , 1990 .

[13]  James R. Larus,et al.  Design Decisions in SPUR , 1986, Computer.

[14]  R. N. Gustafson,et al.  IBM 3081 Processor Unit: Design Considerations and Design Process , 1982, IBM J. Res. Dev..

[15]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[16]  Fred C. Chow,et al.  How many addressing modes are enough? , 1987, ASPLOS.

[17]  Alan Jay Smith,et al.  Aspects of cache memory and instruction buffer performance , 1987 .