Options for dynamic address translation in COMAs

In modern processors, the dynamic translation of virtual addresses to support virtual memory is done before or in parallel with the first-level cache access. As processor technology improves at a rapid pace and the working sets of new applications grow insatiably the latency and bandwidth demands on the TLB (Translation Lookaside Buffer) are getting more and more difficult to meet. The situation is worse in multiprocessor systems, which run larger applications and are plagued by the TLB consistency problem.We evaluate and compare five options for virtual address translation in the context of COMAs (Cache Only Memory Architectures). The dynamic address translation mechanism can be located after the cache access provided the cache is virtual. In a particular design, which we call V-COMA for Virtual COMA, the physical address concept and the traditional TLB are eliminated. While still supporting virtual memory, V-COMA reduces the address translation overhead to a minimum.V-COMA scales well and works better in systems with large number of processors. As a machine running on virtual addresses, V-COMA provides a simple and consistent hardware model to the operating system and the compiler, in which further optimization opportunities are possible.

[1]  Michel Dubois,et al.  Virtual-address caches.2. Multiprocessor issues , 1997, IEEE Micro.

[2]  Todd M. Austin,et al.  High-Bandwidth Address Translation for Multiple-Issue Processors , 1996, ISCA.

[3]  Kenneth C. Yeager The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.

[4]  W. H. Wang,et al.  Organization and performance of a two-level virtual-real cache hierarchy , 1989, ISCA '89.

[5]  Michel Dubois,et al.  VIRTUAL-ADDRESS CACHES , 1997 .

[6]  Mark D. Hill,et al.  Tradeoffs in supporting two page sizes , 1992, ISCA '92.

[7]  Jeffrey S. Chase,et al.  Architecture support for single address space operating systems , 1992, ASPLOS V.

[8]  Anant Agarwal,et al.  Analysis of cache performance for operating systems and multiprogramming , 1989, The Kluwer international series in engineering and computer science.

[9]  Michel Cekleov,et al.  Virtual-address caches. Part 1: problems and solutions in uniprocessors , 1997, IEEE Micro.

[10]  Qing Yang,et al.  CAT—caching address tags: a technique for reducing area cost of on-chip caches , 1995, ISCA.

[11]  Anoop Gupta,et al.  Performance evaluation of memory consistency models for shared-memory multiprocessors , 1991, ASPLOS IV.

[12]  Patricia J. Teller,et al.  Locating multiprocessor TLBs at memory , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[13]  Erik Hagersten,et al.  DDM - A Cache-Only Memory Architecture , 1992, Computer.

[14]  Truman Joe COMA-F: a non-hierarchical cache only memory architecture , 1995 .

[15]  Trevor N. Mudge,et al.  Design Tradeoffs For Software-managed Tlbs , 1994, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[16]  Todd C. Mowry,et al.  Compiler-directed page coloring for multiprocessors , 1996, ASPLOS VII.

[17]  Norman P. Jouppi,et al.  A simulation based study of TLB performance , 1992, ISCA '92.

[18]  James R. Goodman Coherency for multiprocessor virtual address caches , 1987, ASPLOS 1987.

[19]  David A. Wood,et al.  An in-cache address translation mechanism , 1986, ISCA '86.

[20]  Adrian Moga,et al.  Hardware versus software implementation of COMA , 1997, Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162).

[21]  Jerry Huck,et al.  Architectural support for translation table management in large address space machines , 1993, ISCA '93.

[22]  Douglas W. Clark,et al.  Performance of the VAX-11/780 translation buffer: simulation and measurement , 1985, TOCS.

[23]  M. Tremblay,et al.  UltraSparc I: a four-issue processor supporting multimedia , 1996, IEEE Micro.

[24]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[25]  Cathy May,et al.  The PowerPC Architecture: A Specification for a New Family of RISC Processors , 1994 .

[26]  Trevor N. Mudge,et al.  Software-managed address translation , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[27]  Brian N. Bershad,et al.  Reducing TLB and memory overhead using online superpage promotion , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[28]  Anoop Gupta,et al.  The Stanford FLASH Multiprocessor , 1994, ISCA.

[29]  Adrian Moga,et al.  Hardware vs. Software Implementation of COMA , 1997 .

[30]  A. Gupta,et al.  The Stanford FLASH multiprocessor , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[31]  Mark D. Hill,et al.  Surpassing the TLB performance of superpages with less operating system support , 1994, ASPLOS VI.