The exokernel operating system architecture

On traditional operating systems only trusted software such as privileged servers or the kernel can manage resources. This thesis proposes a new approach, the exokernel architecture, which makes resource management unprivileged but safe by separating management from protection: an exokernel protects resources, while untrusted application-level software manages them. As a result, in an exokernel system, untrusted software (e.g., library operating systems) can implement abstractions such as virtual memory, file systems, and networking. The main thrusts of this thesis are: (1) how to build an exokernel system; (2) whether it is possible to build a real one; and (3) whether doing so is a good idea. Our results, drawn from two exokernel systems [25, 48], show that the approach yields dramatic benefits. For example, Xok, an exokernel, runs a web server an order of magnitude faster than the closest equivalent on the same hardware, common unaltered Unix applications up to three times faster, and improves global system performance up to a factor of five. The thesis also discusses some of the new techniques we have used to remove the overhead of protection. The most unusual technique, untrusted deterministic functions, enables an exokernel to verify that applications correctly track the resources they own, eliminating the need for it to do so. Additionally, the thesis reflects on the subtle issues in using downloaded code for extensibility and the sometimes painful lessons learned in building three exokernel-based systems. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

[1]  Meng Chang Chen,et al.  HiPEC: high performance external virtual memory caching , 1994, OSDI '94.

[2]  David Wetherall,et al.  Towards an active network architecture , 1996, CCRV.

[3]  David D. Clark,et al.  The structuring of systems using upcalls , 1985, SOSP '85.

[4]  Thomas Anderson,et al.  The case for application-specific operating systems , 1992, [1992] Proceedings Third Workshop on Workstation Operating Systems.

[5]  Dawson R. Engler,et al.  Server operating systems , 1996, EW 7.

[6]  Dawson R. Engler,et al.  Exterminate all operating system abstractions , 1995, Proceedings 5th Workshop on Hot Topics in Operating Systems (HotOS-V).

[7]  M. Frans Kaashoek,et al.  Embedded Inodes and Explicit Grouping: Exploiting Disk Bandwidth for Small Files , 1997, USENIX Annual Technical Conference.

[8]  GoslingJames Java intermediate bytecodes , 1995 .

[9]  Margo I. Seltzer,et al.  Dealing with disaster: surviving misbehaved kernel extensions , 1996, OSDI '96.

[10]  Larry L. Peterson,et al.  Experiences with a high-speed network adaptor: a software perspective , 1994 .

[11]  Butler W. Lampson,et al.  An open operating system for a single-user machine , 1979, SOSP '79.

[12]  Brian N. Bershad,et al.  Protocol service decomposition for high-performance networking , 1994, SOSP '93.

[13]  Dawson R. Engler,et al.  The operating system kernel as a secure programmable machine , 1994, OPSR.

[14]  Yale N. Patt,et al.  Metadata update performance in file systems , 1994, OSDI '94.

[15]  Brian N. Bershad,et al.  Fast mutual exclusion for uniprocessors , 1992, ASPLOS V.

[16]  Brian N. Bershad,et al.  An Extensible Protocol Architecture for Application-Specific Networking , 1996, USENIX Annual Technical Conference.

[17]  Dawson R. Engler,et al.  DPF: Fast, Flexible Message Demultiplexing Using Dynamic Code Generation , 1996, SIGCOMM.

[18]  Robbert van Renesse,et al.  Experiences with the Amoeba distributed operating system , 1990, CACM.

[19]  VahdatAmin,et al.  Tools for the development of application-specific virtual memory management , 1993 .

[20]  Jay Lepreau,et al.  The Flux OS Toolkit: reusable components for OS implementation , 1997, Proceedings. The Sixth Workshop on Hot Topics in Operating Systems (Cat. No.97TB100133).

[21]  Jim Zelenka,et al.  Informed prefetching and caching , 1995, SOSP.

[22]  Margo Seltzer,et al.  VINO: An Integrated Platform for Operating System and Database Research , 1994 .

[23]  Timothy Roscoe,et al.  The structure of a multi-service operating system , 1995 .

[24]  Jerry Huck,et al.  Architectural support for translation table management in large address space machines , 1993, ISCA '93.

[25]  Butler W. Lampson,et al.  Hints for Computer System Design , 1983, IEEE Software.

[26]  Peter B. Danzig,et al.  A Hierarchical Internet Object Cache , 1996, USENIX ATC.

[27]  Matthew I. Frank,et al.  UDM: User Direct Messaging for General-Purpose Multiprocessing , 1996 .

[28]  Brian N. Bershad,et al.  Dynamic binding for an extensible system , 1996, OSDI '96.

[29]  M. Frans Kaashoek,et al.  Software prefetching and caching for translation lookaside buffers , 1994, OSDI '94.

[30]  Margo I. Seltzer,et al.  A Comparison of OS Extension Technologies , 1996, USENIX Annual Technical Conference.

[31]  Peter Deutsch,et al.  A Flexible Measurement Tool for Software Systems , 1971, IFIP Congress.

[32]  John Jannotti Applying exokernel principles to conventional operating systems , 1998 .

[33]  Abraham Silberschatz,et al.  4.2BSD and 4.3BSD as examples of the UNIX system , 1985, CSUR.

[34]  Brian N. Bershad,et al.  Efficient Packet Demultiplexing for Multiple Endpoints and Large Messages , 1994, USENIX Winter.

[35]  Helen Custer,et al.  Inside Windows NT , 1992 .

[36]  Carl A. Waldspurger,et al.  Stride Scheduling: Deterministic Proportional- Share Resource Management , 1995 .

[37]  David R. Cheriton,et al.  Application-controlled physical memory using external page-cache management , 1992, ASPLOS V.

[38]  Mike Hibler,et al.  Microkernels meet recursive virtual machines , 1996, OSDI '96.

[39]  Robin Fairbairns,et al.  The Design and Implementation of an Operating System to Support Distributed Multimedia Applications , 1996, IEEE J. Sel. Areas Commun..

[40]  Steven McCanne,et al.  The BSD Packet Filter: A New Architecture for User-level Packet Capture , 1993, USENIX Winter.

[41]  David R. Cheriton,et al.  A caching model of operating system kernel functionality , 1994, OSDI '94.

[42]  David Banks,et al.  User-space protocols deliver high performance to applications on a low-cost Gb/s LAN , 1994, SIGCOMM '94.

[43]  Dawson R. Engler,et al.  Exokernel: an operating system architecture for application-level resource management , 1995, SOSP.

[44]  Henry M. Levy,et al.  Hardware and software support for efficient exception handling , 1994, ASPLOS VI.

[45]  Scott Devine,et al.  Disco: running commodity operating systems on scalable multiprocessors , 1997, TOCS.

[46]  R. Stallman EMACS the extensible, customizable self-documenting display editor , 1981, SIGPLAN SIGOA Symposium on Text Manipulation.

[47]  Joanne L. Martin,et al.  A Retrospective , 1988 .

[48]  William E. Weihl,et al.  Lottery scheduling: flexible proportional-share resource management , 1994, OSDI '94.

[49]  Kai Li,et al.  Implementation and performance of application-controlled file caching , 1994, OSDI '94.

[50]  Gerald J. Sussman,et al.  Structure and interpretation of computer programs , 1985, Proceedings of the IEEE.

[51]  Brian N. Bershad,et al.  Extensibility safety and performance in the SPIN operating system , 1995, SOSP.

[52]  Per Brinch Hansen,et al.  The nucleus of a multiprogramming system , 1970, CACM.

[53]  Brian N. Bershad,et al.  Scheduler activations: effective kernel support for the user-level management of parallelism , 1991, TOCS.

[54]  William M. Waite,et al.  Proceedings of the sixteenth ACM symposium on Operating systems principles , 1991, SOSP 1997.

[55]  Trevor N. Mudge,et al.  Design Tradeoffs For Software-managed Tlbs , 1994, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[56]  John K. Ousterhout,et al.  Why Aren't Operating Systems Getting Faster As Fast as Hardware? , 1990, USENIX Summer.

[57]  David R. Cheriton An experiment using registers for fast message-based interprocess communication , 1984, OPSR.

[58]  Gerry Kane,et al.  MIPS RISC Architecture , 1987 .

[59]  David D. Clark,et al.  Architectural considerations for a new generation of protocols , 1990, SIGCOMM '90.

[60]  Andrew C. Myers,et al.  A decentralized model for information flow control , 1997, SOSP.

[61]  David Mazières,et al.  Secure applications need flexible operating systems , 1997, Proceedings. The Sixth Workshop on Hot Topics in Operating Systems (Cat. No.97TB100133).

[62]  Amin Vahdat,et al.  Tools for the development of application-specific virtual memory management , 1993, OOPSLA '93.

[63]  George Eckel Inside Windows NT , 1993 .

[64]  Robert P. Goldberg,et al.  Survey of virtual machine research , 1974, Computer.

[65]  Robert J. Creasy,et al.  The Origin of the VM/370 Time-Sharing System , 1981, IBM J. Res. Dev..

[66]  Robert Grimm,et al.  Application performance and flexibility on exokernel systems , 1997, SOSP.

[67]  Jeffrey C. Mogul,et al.  The packer filter: an efficient mechanism for user-level network code , 1987, SOSP '87.

[68]  Jochen Liedtke,et al.  On micro-kernel construction , 1995, SOSP.

[69]  David Clark The structuring of systems using upcalls , 1985, SOSP 1985.

[70]  Larry L. Peterson,et al.  PathFinder: A Pattern-Based Packet Classifier , 1994, OSDI.

[71]  Andrew W. Appel,et al.  Virtual memory primitives for user programs , 1991, ASPLOS IV.

[72]  D. Probert,et al.  SPACE: a new approach to operating system abstraction , 1991, Proceedings 1991 International Workshop on Object Orientation in Operating Systems.

[73]  Joseph S. Barrera Invocation chaining: manipulating lightweight objects across heavyweight boundaries , 1993, Proceedings of IEEE 4th Workshop on Workstation Operating Systems. WWOS-III.

[74]  James Gosling,et al.  Java Intermediate Bytecode , 1995, Intermediate Representations Workshop.

[75]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[76]  William A. Wulf,et al.  HYDRA , 1974, Commun. ACM.

[77]  Claude Kaiser,et al.  CHORUS Distributed Operating System , 1988, Comput. Syst..

[78]  William J. Bolosky,et al.  Mach: A New Kernel Foundation for UNIX Development , 1986, USENIX Summer.

[79]  Robert Wahbe,et al.  Efficient software-based fault isolation , 1994, SOSP '93.

[80]  Jochen Liedtke,et al.  The performance of μ-kernel-based systems , 1997, SOSP.

[81]  Bryan Ford,et al.  CPU inheritance scheduling , 1996, OSDI '96.

[82]  Mark D. Hill,et al.  A new page table for 64-bit address spaces , 1995, SOSP.

[83]  Henry M. Levy,et al.  Separating data and control transfer in distributed operating systems , 1994, ASPLOS VI.

[84]  Larry L. Peterson,et al.  Scout: a communications-oriented operating system , 1995, Proceedings 5th Workshop on Hot Topics in Operating Systems (HotOS-V).

[85]  Willy Zwaenepoel,et al.  IO-Lite: a unified I/O buffering and caching system , 1999, TOCS.

[86]  Thorsten von Eicken,et al.  U-Net: a user-level network interface for parallel and distributed computing , 1995, SOSP.

[87]  Alessandro Forin,et al.  UNIX as an Application Program , 1990, USENIX Summer.

[88]  Yogen K. Dalal,et al.  Pilot: an operating system for a personal computer , 1980, CACM.

[89]  Dawson R. Engler,et al.  ASHs: Application-Specific Handlers for High-Performance Messaging , 1996, SIGCOMM.

[90]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[91]  Jochen Liedtke,et al.  Improving IPC by kernel design , 1994, SOSP '93.

[92]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.