Software-oriented memory-management design

Changing trends in technologies, notably cheaper and faster memory hierarchies, have made it worthwhile to revisit many hardware-oriented design decisions made in previous decades. Hardware-oriented designs, in which one uses special-purpose hardware to perform some dedicated function, are a response to a high cost of executing instructions out of memory; when caches are expensive, slow, and/or in scarce supply, it is a perfectly reasonable reaction to build hardware state machines that do not compete with user applications for cache space and do not rely on the performance of the caches. In contrast, when the caches are large enough to withstand competition between the application and operating system, the cost of executing operating system functions out of the memory subsystem decreases significantly, and software-oriented designs become viable. Software-oriented designs, in which one dispenses with special-purpose hardware and instead performs the same function entirely in software, offer dramatically increased flexibility over hardware state machines at a modest cost in performance. This dissertation explores a software-oriented design for a virtual memory management system. It shows not only that a software design is more flexible than hardware designs, but that a software scheme can perform as well as most hardware schemes. Eliminating dedicated special-purpose hardware from processor design saves chip area and reduces power consumption, thus lowering the overall system cost. Moreover, a flexible design aids in the portability of system software. A software-oriented design methodology should therefore benefit architects of many different microprocessor designs, from general-purpose processors in PC-class and workstation-class computers, to embedded processors where cost tends to have a higher priority than performance. The particular implementation described in the following chapters, which is centered around a virtual cache hierarchy managed by the operating system, is shown to be useful for real-time systems, shared-memory multiprocessors, and architecture emulation.

[1]  Anoop Gupta,et al.  The impact of architectural trends on operating system performance , 1995, SOSP.

[2]  Scott A. Mahlke,et al.  Dynamic memory disambiguation using the memory conflict buffer , 1994, ASPLOS VI.

[3]  Graham Hamilton,et al.  The Spring Nucleus: A Microkernel for Objects , 1993 .

[4]  Albert Chang,et al.  801 storage: architecture and programming , 1988, TOCS.

[5]  Michael N. Nelson,et al.  Virtual memory support for multiple page sizes , 1993, Proceedings of IEEE 4th Workshop on Workstation Operating Systems. WWOS-III.

[6]  David R. Cheriton,et al.  Software-controlled caches in the VMP multiprocessor , 1986, ISCA 1986.

[7]  Gurindar S. Sohi,et al.  Instruction issue logic for high-performance, interruptable pipelined processors , 1987, ISCA '98.

[8]  Douglas W. Clark,et al.  Performance of the VAX-11/780 translation buffer: simulation and measurement , 1985, TOCS.

[9]  William J. Bolosky,et al.  Mach: A New Kernel Foundation for UNIX Development , 1986, USENIX Summer.

[10]  Cathy May,et al.  The PowerPC Architecture: A Specification for a New Family of RISC Processors , 1994 .

[11]  K. Ebcioglu,et al.  Daisy: Dynamic Compilation For 10o?40 Architectural Compatibility , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[12]  Trevor N. Mudge,et al.  Trap-driven simulation with Tapeworm II , 1994, ASPLOS VI.

[13]  Robert Wahbe,et al.  Efficient software-based fault isolation , 1994, SOSP '93.

[14]  James R. Larus,et al.  Design Decisions in SPUR , 1986, Computer.

[15]  Robert P. Colwell,et al.  A VLIW architecture for a trace scheduling compiler , 1987, ASPLOS 1987.

[16]  Peter Davies,et al.  The TLB slice—a low-cost high-speed address translation mechanism , 1990, ISCA '90.

[17]  Jochen Liedtke,et al.  Guarded page tables on Mips R4600 or an exercise in architecture-dependent micro optimization , 1996, OPSR.

[18]  Anoop Gupta,et al.  The VMP multiprocessor: initial experience, refinements, and performance evaluation , 1988, ISCA '88.

[19]  R. Nair,et al.  Exploiting Instruction Level Parallelism In Processors By Caching Scheduled Groups , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[20]  Ricardo Bianchini,et al.  Linking Shared Segments , 1993, USENIX Winter.

[21]  Wen-mei W. Hwu,et al.  Run-time Adaptive Cache Hierarchy Via Reference Analysis , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[22]  David L. Black,et al.  Machine-independent virtual memory management for paged uniprocessor and multiprocessor architectures , 1987, ASPLOS 1987.

[23]  Brian N. Bershad,et al.  Consistency management for virtually indexed caches , 1992, ASPLOS V.

[24]  Brian N. Bershad,et al.  The interaction of architecture and operating system design , 1991, ASPLOS IV.

[25]  Mike Johnson,et al.  Superscalar microprocessor design , 1991, Prentice Hall series in innovative technology.

[26]  D. R. Cheriton,et al.  Multi-level shared caching techniques for scalability in VMP-M/C , 1989, ISCA '89.

[27]  Margaret Martonosi,et al.  Informing Loads: Enabling Software to Observe and React to Memory Behavior , 1995 .

[28]  Ravi Nair,et al.  Profiling IBM RS/6000 Applications , 1996, Int. J. Comput. Simul..

[29]  Michael L. Scott,et al.  Design Rationale for Psyche a General-Purpose Multiprocessor Operating System , 1988, ICPP.

[30]  Josep Torrellas,et al.  Characterizing the caching and synchronization performance of a multiprocessor operating system , 1992, ASPLOS V.

[31]  Randy H. Katz,et al.  Eliminating the address translation bottleneck for physical address cache , 1992, ASPLOS V.

[32]  James R. Larus,et al.  The Wisconsin Wind Tunnel: virtual prototyping of parallel computers , 1993, SIGMETRICS '93.

[33]  Andrew R. Pleszkun,et al.  Implementing Precise Interrupts in Pipelined Processors , 1988, IEEE Trans. Computers.

[34]  John Paul Shen,et al.  Speculative disambiguation: a compilation technique for dynamic memory disambiguation , 1994, ISCA '94.

[35]  Mark D. Hill,et al.  Tradeoffs in supporting two page sizes , 1992, ISCA '92.

[36]  David P. Anderson,et al.  The performance of message‐passing using restricted virtual memory remapping , 1991, Softw. Pract. Exp..

[37]  Trevor Mudge,et al.  Design tradeoffs for software-managed TLBs , 1993, ISCA '93.

[38]  Peter J. Denning Virtual Memory , 1996, ACM Comput. Surv..

[39]  Andrew R. Pleszkun,et al.  WISQ: a restartable architecture using queues , 1987, ISCA '87.

[40]  Jeffrey S. Chase,et al.  Lightweight shared objects in a 64-bit operating system , 1992, OOPSLA 1992.

[41]  David A. Wood,et al.  Design and Evaluation of In-Cache Address Translation , 1990 .

[42]  Jochen Liedtke,et al.  On micro-kernel construction , 1995, SOSP.

[43]  Alan Eustace,et al.  ATOM - A System for Building Customized Program Analysis Tools , 1994, PLDI.

[44]  J. Eliot B. Moss,et al.  Working with Persistent Objects: To Swizzle or Not to Swizzle , 1992, IEEE Trans. Software Eng..

[45]  Alan Jay Smith,et al.  Cache Memories , 1982, CSUR.

[46]  Jochen Liedtke Address Space Sparsity and Fine Granularity , 1995, ACM SIGOPS Oper. Syst. Rev..

[47]  Michael L. Scott,et al.  Dynamic Sharing and Backward Compatibility on 64-Bit Machines , 1992 .

[48]  Ramesh Balan,et al.  A Scalable Implementation of Virtual Memory HAT Layer for Shared Memory Multiprocessor Machines , 1992, USENIX Summer.

[49]  Elliott I. Organick,et al.  The multics system: an examination of its structure , 1972 .

[50]  Jonathan Walpole,et al.  The effects of virtually addressed caches on virtual memory design and performance , 1992, OPSR.

[51]  Emin Gün Sirer,et al.  SPIN—an extensible microkernel for application-specific operating system services , 1995, OPSR.

[52]  W. H. Wang,et al.  Organization and performance of a two-level virtual-real cache hierarchy , 1989, ISCA '89.

[53]  Samuel J. Leffler,et al.  The design and implementation of the 4.3 BSD Unix operating system , 1991, Addison-Wesley series in computer science.

[54]  Timothy Roscoe,et al.  Linkage in the Nemesis single address space operating system , 1994, OPSR.

[55]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[56]  Yousef A. Khalidi,et al.  Improving the Address Translation Performance of Widely Shared Pages , 1995 .

[57]  Gary S. Tyson,et al.  A modified approach to data cache management , 1995, MICRO 1995.

[58]  Norman P. Jouppi,et al.  A simulation based study of TLB performance , 1992, ISCA '92.

[59]  Andrew W. Appel,et al.  Virtual memory primitives for user programs , 1991, ASPLOS IV.

[60]  David A. Wood,et al.  An in-cache address translation mechanism , 1986, ISCA '86.

[61]  M. Frans Kaashoek,et al.  Software prefetching and caching for translation lookaside buffers , 1994, OSDI '94.

[62]  Trevor N. Mudge,et al.  Software-managed address translation , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[63]  Yale N. Patt,et al.  Checkpoint repair for out-of-order execution machines , 1987, ISCA '87.

[64]  Trevor N. Mudge,et al.  Optimal allocation of on-chip memory for multiple-API operating systems , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[65]  Milon Mackey,et al.  Mach on a Virtually Addressed Cache Architecture , 1990, USENIX MACH Symposium.

[66]  Michael J. Flynn,et al.  An area model for on-chip memories and its application , 1991 .

[67]  Yale N. Patt,et al.  HPSm, a high performance restricted data flow architecture having minimal functionality , 1986, ISCA '98.

[68]  Michael N. Nelson,et al.  An overview of the Spring system , 1994, Proceedings of COMPCON '94.