Improving dependability by revisiting operating system design

Existing operating system (OS) designs provide inadequate isolation of user applications from errors that occur in OS services. If an error causes the failure of an OS service, all dependent applications are affected. The OS design described in this paper ameliorates this problem by reorganizing OS state in an effort to make OS services transparently restartable. This is achieved by partitioning application-related OS state into isolated per-application memory regions. Access to these memory regions is provided to OS services on a "need-to-know" basis when processing application requests. Applications are not allowed access to these memory regions for security. This design helps improve the dependability of the system.

[1]  Roger L. Haskin,et al.  Recovery management in QuickSilver , 1988, TOCS.

[2]  Claude Kaiser,et al.  Overview of the CHORUS ® Distributed Operating Systems , 1991 .

[3]  Roy H. Campbell,et al.  Exploring Recovery from Operating System Lockups , 2007, USENIX Annual Technical Conference.

[4]  Herbert Bos,et al.  Roadmap to a Failure-Resilient Operating System , 2007, login Usenix Mag..

[5]  Wietse Z. Venema,et al.  Murphy's Law and Computer Security , 1996, USENIX Security Symposium.

[6]  Peter M. Chen,et al.  The Rio file cache: surviving operating system crashes , 1996, ASPLOS VII.

[7]  Stefan Götz,et al.  Unmodified Device Driver Reuse and Improved System Dependability via Virtual Machines , 2004, OSDI.

[8]  Brian N. Bershad,et al.  Recovering device drivers , 2004, TOCS.

[9]  George Candea,et al.  Microreboot - A Technique for Cheap Recovery , 2004, OSDI.

[10]  Roy H. Campbell,et al.  Choices (class hierarchical open interface for custom embedded systems) , 1987, OPSR.

[11]  Herbert Bos,et al.  Can we make operating systems reliable and secure? , 2006, Computer.

[12]  Jorrit N. Herder,et al.  TOWARDS A TRUE MICROKERNEL OPERATING SYSTEM A revision of MINIX that brings quality enhancements and strongly reduces the kernel in size by moving device drivers to user-space , 2005 .

[13]  Jonathan M. Smith,et al.  Eros: a capability system , 1999 .

[14]  Jochen Liedtke,et al.  On micro-kernel construction , 1995, SOSP.

[15]  Herbert Bos,et al.  Safe kernel programming in the OKE , 2002, 2002 IEEE Open Architectures and Network Programming Proceedings. OPENARCH 2002 (Cat. No.02EX571).

[16]  David R. Cheriton,et al.  A caching model of operating system kernel functionality , 1995, OPSR.

[17]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[18]  Dan Walsh,et al.  Design and implementation of the Sun network filesystem , 1985, USENIX Conference Proceedings.

[19]  Leah H. Jamieson,et al.  Establishing the Genuinity of Remote Computer Systems , 2003, USENIX Security Symposium.

[20]  Mike Hibler,et al.  Interface and execution models in the Fluke kernel , 1999, OSDI '99.

[21]  Krste Asanovic,et al.  Mondrian memory protection , 2002, ASPLOS X.

[22]  J. Liedtke On -Kernel Construction , 1995 .

[23]  George C. Necula,et al.  SafeDrive: safe and recoverable extensions using language-based techniques , 2006, OSDI '06.

[24]  Junfeng Yang,et al.  An empirical study of operating systems errors , 2001, SOSP.

[25]  Vadim Abrossimov,et al.  Fast Error Recovery in CHORUS/OS: The Hot-Restart Technology , 1996 .