Towards virtually-addressed memory hierarchies

Current cache hierarchies are indexed in parallel with a TLB but their tags are part of the physical address so that the memory hierarchy is physically addressed. This design faces problems as more concurrency is exploited in the processor core and as the memory demand of emerging applications is growing fast. The traditional TLB does not scale well inside the processor core and its hit rate call be poor for data-intensive applications or scientific applications without much locality. At the same time, given current trends towards computing in memory and in communication interfaces, virtual addresses are needed not just inside the processor but throughout the memory hierarchy. These observations have prompted us to result the problem of moving virtual address translation away from the processor. This paper introduces new ideas to enable the use of virtual addresses throughout the memory hierarchy. The major idea is the replacement of the TLB with a small Synonym Lookaside Buffer (SLB), which scales well because its size depends on the number of addresses, and not on the size of the application or of the physical memory. We also characterize synonym usage, evaluate the amount of cache and SLB flushing due to remapping of addresses, and compare the miss rate of various virtual physical cache organizations for several application domains. These evaluations show that virtually addressed memory hierarchies overall have better performance behavior than physically-addressed memory hierarchies. Finally, we also show how virtually-addressed memory hierarchies facilitate natural, scalable multiprocessor extensions, as well as computing-in-memory in the context of general-purpose computers.

[1]  Jeffrey S. Chase,et al.  Architecture support for single address space operating systems , 1992, ASPLOS V.

[2]  Peter Petrov,et al.  Virtual page tag reduction for low-power TLBs , 2003, Proceedings 21st International Conference on Computer Design.

[3]  Trevor N. Mudge,et al.  Software-managed address translation , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[4]  Milon Mackey,et al.  Mach on a Virtually Addressed Cache Architecture , 1990, USENIX MACH Symposium.

[5]  James R. Goodman Coherency for multiprocessor virtual address caches , 1987, ASPLOS 1987.

[6]  Jaewook Shin,et al.  Mapping Irregular Applications to DIVA, a PIM-based Data-Intensive Architecture , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[7]  Michel Dubois,et al.  Memory access buffering in multiprocessors , 1998, ISCA '98.

[8]  Hsiao-Keng Jerry Chu,et al.  Zero-Copy TCP in Solaris , 1996, USENIX Annual Technical Conference.

[9]  Trevor N. Mudge,et al.  Virtual Memory: Issues of Implementation , 1998, Computer.

[10]  Albert Chang,et al.  801 storage: architecture and programming , 1988, TOCS.

[11]  David R. Cheriton,et al.  Software-controlled caches in the VMP multiprocessor , 1986, ISCA 1986.

[12]  Larry A. Bergman,et al.  A design analysis of a hybrid technology multithreaded architecture for petaflops scale computation3 , 1999, ICS '99.

[13]  Cathy May,et al.  The PowerPC Architecture: A Specification for a New Family of RISC Processors , 1994 .

[14]  John B. Carter,et al.  An argument for simple COMA , 1995, Future Gener. Comput. Syst..

[15]  Leigh Stoller,et al.  Increasing TLB reach using superpages backed by shadow memory , 1998, ISCA.

[16]  Michel Dubois,et al.  Options for dynamic address translation in COMAs , 1998, ISCA.

[17]  Mark D. Hill,et al.  Tradeoffs in supporting two page sizes , 1992, ISCA '92.

[18]  Michel Cekleov,et al.  Virtual-address caches. Part 1: problems and solutions in uniprocessors , 1997, IEEE Micro.

[19]  Todd C. Mowry,et al.  Compiler-directed page coloring for multiprocessors , 1996, ASPLOS VII.

[20]  David A. Wood,et al.  An in-cache address translation mechanism , 1986, ISCA '86.

[21]  Brian N. Bershad,et al.  Reducing TLB and memory overhead using online superpage promotion , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[22]  Erik Hagersten,et al.  WildFire: a scalable path for SMPs , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[23]  Andrew W. Appel,et al.  Virtual memory primitives for user programs , 1991, ASPLOS IV.

[24]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[25]  Anant Agarwal,et al.  Analysis of cache performance for operating systems and multiprogramming , 1989, The Kluwer international series in engineering and computer science.

[26]  Douglas W. Clark,et al.  Performance of the VAX-11/780 translation buffer: simulation and measurement , 1985, TOCS.

[27]  M. Tremblay,et al.  UltraSparc I: a four-issue processor supporting multimedia , 1996, IEEE Micro.

[28]  Patricia J. Teller Translation-lookaside buffer consistency , 1990, Computer.

[29]  Alan E. Charlesworth,et al.  Starfire: extending the SMP envelope , 1998, IEEE Micro.

[30]  Michel Dubois,et al.  Virtual-address caches.2. Multiprocessor issues , 1997, IEEE Micro.

[31]  Todd M. Austin,et al.  High-Bandwidth Address Translation for Multiple-Issue Processors , 1996, ISCA.

[32]  C. J. Theaker,et al.  Virtual memory for microcomputers , 1985 .

[33]  Willy Zwaenepoel,et al.  IO-Lite: a unified I/O buffering and caching system , 1999, TOCS.

[34]  Kenneth C. Yeager The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.

[35]  W. H. Wang,et al.  Organization and performance of a two-level virtual-real cache hierarchy , 1989, ISCA '89.

[36]  Yarsun Hsu,et al.  A Quantitative Evaluation of Cache Types for High-Performance Computer Systems , 1993, IEEE Trans. Computers.

[37]  Yale N. Patt,et al.  Simultaneous subordinate microthreading (SSMT) , 1999, ISCA.