Scalability of microkernel-based systems

Microkernel-based systems divide the operating system functionality into individual and isolated components. The system components are subject to applicationclass protection and isolation. This structuring method has a number of benefits, such as fault isolation between system components, safe extensibility, co-existence of different policies, and isolation between mutually distrusting components. However, such strict isolation limits the information flow between subsystems including information that is essential for performance and scalability in multiprocessor systems. Semantically richer kernel abstractions scale at the cost of generality and minimality–two desired properties of a microkernel. I propose an architecture that allows for dynamic adjustment of scalability-relevant parameters in a general, flexible, and safe manner. I introduce isolation boundaries for microkernel resources and the system processors. The boundaries are controlled at user-level. Operating system components and applications can transform their semantic information into three basic parameters relevant for scalability: the involved processors (depending on their relation and interconnect), degree of concurrency, and groups of resources. I developed a set of mechanisms that allow a kernel to: 1. efficiently track processors on a per-resource basis with support for very large number of processors, 2. dynamically and safely adjust lock primitives at runtime, including full deactivation of kernel locks in the case of no concurrency, 3. dynamically and safely adjust locking granularity at runtime, 4. provide a scalable translation-look-aside buffer (TLB) coherency algorithm that uses versions to minimize interprocessor interference for concurrent memory resource re-allocations, and 5. efficiently track and communicate resource usage in a component-based operating system. Based on my architecture, it is possible to efficiently co-host multiple isolated, independent, and loosely coupled systems on larger multiprocessor systems, and also to fine-tune individual subsystems of a system that have different and potentially conflicting scalability and performance requirements. I describe the application of my techniques to a real system: L4Ka::Pistachio, the latest variant of an L4 microkernel. L4Ka::Pistachio is used in a variety of research and industry projects. Introducing a new dimension to a system — parallelism of multiprocessors — naturally introduces new complexity and overheads. I evaluate my solutions by comparing with the most challenging competitor: the uniprocessor variant of the very same and highly optimized microkernel.

[1]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[2]  Jan Stöß Using Operating System Instrumentation and Event Logging to Support User-level Multiprocessor Schedulers , 2006 .

[3]  Mary K. Vernon,et al.  Efficient synchronization primitives for large-scale cache-coherent multiprocessors , 1989, ASPLOS III.

[4]  Paul E. McKenney Selecting locking primitives for parallel programming , 1996, CACM.

[5]  J. Mark Bull,et al.  A microbenchmark suite for OpenMP 2.0 , 2001, CARN.

[6]  Sebastian Schönberg,et al.  Using PCI bus systems in real time environments , 2002 .

[7]  Steven Fortune,et al.  Parallelism in random access machines , 1978, STOC.

[8]  Trent Jaeger,et al.  The SawMill framework for virtual memory diversity , 2001 .

[9]  Jochen Liedtke,et al.  The performance of μ-kernel-based systems , 1997, SOSP.

[10]  Peter Druschel,et al.  Resource containers: a new facility for resource management in server systems , 1999, OSDI '99.

[11]  Joshua LeVasseur,et al.  Towards Scalable Multiprocessor Virtual Machines , 2004, Virtual Machine Research and Technology Symposium.

[12]  Anoop Gupta,et al.  The Stanford Dash multiprocessor , 1992, Computer.

[13]  Andrew S. Tanenbaum,et al.  Modern Operating Systems , 1992 .

[14]  Ruth E. Goldenberg,et al.  VMS for Alpha Platforms Internals and Data Structures , 1993 .

[15]  Dror G. Feitelson,et al.  Job Scheduling in Multiprogrammed Parallel Systems , 1997 .

[16]  Stefan Götz,et al.  Unmodified Device Driver Reuse and Improved System Dependability via Virtual Machines , 2004, OSDI.

[17]  Gernot Heiser,et al.  Design and Implementation of the L4 Microkernel for Alpha Multiprocessors , 2002 .

[18]  John K. Ousterhout Scheduling Techniques for Concurrebt Systems. , 1982, ICDCS 1982.

[19]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[20]  Shikharesh Majumdar,et al.  Scheduling in multiprogrammed parallel systems , 1988, SIGMETRICS 1988.

[21]  John K. Ousterhout,et al.  Scheduling Techniques for Concurrent Systems , 1982, ICDCS.

[22]  Josep Torrellas,et al.  Benefits of cache-affinity scheduling in shared-memory multiprocessors: a summary , 1993, SIGMETRICS '93.

[23]  Jonathan Appavoo,et al.  Clustered Objects , 2005 .

[24]  Mendel Rosenblum,et al.  Cellular disco: resource management using virtual clusters on shared-memory multiprocessors , 2000, TOCS.

[25]  James K. Archibald,et al.  Cache coherence protocols: evaluation using a multiprocessor simulation model , 1986, TOCS.

[26]  R.W. Wisniewski,et al.  Efficient, Unified, and Scalable Performance Monitoring for Multiprocessor Operating Systems , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[27]  Larry Rudolph,et al.  Dynamic decentralized cache schemes for mimd parallel processors , 1984, ISCA '84.

[28]  Jochen Liedtke,et al.  Improving IPC by kernel design , 1994, SOSP '93.

[29]  Hermann Härtig,et al.  Pragmatic Nonblocking Synchronization for Real-Time Systems , 2001, USENIX Annual Technical Conference, General Track.

[30]  Abraham Silberschatz,et al.  The Pebble Component-Based Operating System , 1999, USENIX Annual Technical Conference, General Track.

[31]  Allan Gottlieb,et al.  Highly parallel computing , 1989, Benjamin/Cummings Series in computer science and engineering.

[32]  Matthias Hauswirth,et al.  Using Hardware Performance Monitors to Understand the Behavior of Java Applications , 2004, Virtual Machine Research and Technology Symposium.

[33]  Andris Padegs,et al.  Architecture of the IBM system/370 , 1978, CACM.

[34]  George C. Necula,et al.  Safe kernel extensions without run-time checking , 1996, OSDI '96.

[35]  Paul E. McKenney,et al.  READ-COPY UPDATE: USING EXECUTION HISTORY TO SOLVE CONCURRENCY PROBLEMS , 2002 .

[36]  Birgit Pfitzmann,et al.  The PERSEUS System Architecture , 2001 .

[37]  J. Shapiro,et al.  EROS: a fast capability system , 2000, OPSR.

[38]  Cristan Szmajda Calypso: A portable translation layer , 2001 .

[39]  Brian N. Bershad,et al.  Scheduler activations: effective kernel support for the user-level management of parallelism , 1991, TOCS.

[40]  David L. Black Scheduling support for concurrency and parallelism in the Mach operating system , 1990, Computer.

[41]  Jeffrey K. Hollingsworth,et al.  Using Hardware Performance Monitors to Isolate Memory Bottlenecks , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[42]  Patrick Sobalvarro,et al.  Demand-Based Coscheduling of Parallel Jobs on Multiprogrammed Multiprocessors , 1995, JSSPP.

[43]  Ching-Tien Ho,et al.  Optimal communication primitives and graph embeddings on hypercubes , 1990 .

[44]  Stuart Ritchie,et al.  The Raven Kernel: a Microkernel for shared memory multiprocessors , 1993 .

[45]  Allan Gottlieb,et al.  Coordinating parallel processors: a partial unification , 1981, CARN.

[46]  The Performance Implications of Locality Information Usage in Shared-Memory . . . , 1996 .

[47]  Eli Upfal,et al.  A simple load balancing scheme for task allocation in parallel machines , 1991, SPAA '91.

[48]  Ravi Rajwar,et al.  Speculative lock elision: enabling highly concurrent multithreaded execution , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[49]  Anoop Gupta,et al.  Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes , 1990, ICPP.

[50]  David R. Cheriton,et al.  The synergy between non-blocking synchronization and operating system structure , 1996, OSDI '96.

[51]  Lehrstuhl Systemarchitektur,et al.  Using Operating System Instrumentation and Event Logging to Support User-level Multiprocessor Schedulers , 2005 .

[52]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[53]  Andrew S. Tanenbaum,et al.  Paramecium: an extensible object-based kernel , 1995, Proceedings 5th Workshop on Hot Topics in Operating Systems (HotOS-V).

[54]  David E. Culler,et al.  Active messages: an efficient communication architecture for multiprocessors , 1993 .

[55]  Michael Stumm,et al.  Hierarchical clustering: A structure for scalable multiprocessor operating system design , 1995, The Journal of Supercomputing.

[56]  Andreas Haeberlen,et al.  User-Level Management of Kernel Memory , 2003, Asia-Pacific Computer Systems Architecture Conference.

[57]  Boris Weissman,et al.  Performance counters and state sharing annotations: a unified approach to thread locality , 1998, ASPLOS VIII.

[58]  Erik Hagersten,et al.  Efficient Software Synchronization on Large Cache Coherent Multiprocessors , 1994 .

[59]  S. Lennart Johnsson,et al.  Distributed Routing Algorithms for Broadcasting and Personalized Communication in Hypercubes , 1986, ICPP.

[60]  Michael L. Scott,et al.  Kernel-Kernel communication in a shared-memory multiprocessor , 1993, Concurr. Pract. Exp..

[61]  Francis F. Lee,et al.  Study of "Look-Aside" Memory , 1969, IEEE Transactions on Computers.

[62]  Uwe Schwiegelshohn,et al.  Theory and Practice in Parallel Job Scheduling , 1997, JSSPP.

[63]  Charles E. Leiserson,et al.  Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.

[64]  William A. Wulf,et al.  HYDRA , 1974, Commun. ACM.

[65]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[66]  Anant Agarwal,et al.  Scalability of parallel machines , 1991, CACM.

[67]  Hermann Härtig,et al.  DROPS: OS support for distributed multimedia applications , 1998, EW 8.

[68]  Dawson R. Engler,et al.  Exokernel: an operating system architecture for application-level resource management , 1995, SOSP.

[69]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[70]  Brian N. Bershad,et al.  Lightweight remote procedure call , 1989, TOCS.

[71]  Ronald C. Unrau Scalable memory management through hierarchical symmetric multiprocessing , 1993 .

[72]  Dilma Da Silva,et al.  Providing a Linux API on the Scalable K42 Kernel , 2003, USENIX Annual Technical Conference, FREENIX Track.

[73]  Edward D. Lazowska,et al.  Quantitative System Performance , 1985, Int. CMG Conference.

[74]  Kevin P. McAuliffe,et al.  The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture , 1985, ICPP.

[75]  Beng-Hong Lim,et al.  Reactive synchronization algorithms for multiprocessors , 1994, ASPLOS VI.

[76]  Brian N. Bershad,et al.  Extensibility safety and performance in the SPIN operating system , 1995, SOSP.

[77]  Michael Wayne Young Exporting a user interface to memory management from a communication-oriented operating system , 1989 .

[78]  Robert D. Blumofe,et al.  Scheduling multithreaded computations by work stealing , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[79]  Amnon Barak,et al.  A distributed load‐balancing policy for a multicomputer , 1985, Softw. Pract. Exp..

[80]  Michael Stumm,et al.  Tornado: maximizing locality and concurrency in a shared memory multiprocessor operating system , 1999, OSDI '99.

[81]  Thomas E. Anderson,et al.  The Performance Implications of Spin-Waiting Alternatives for Shared-Memory Multiprocessors , 1989, ICPP.

[82]  Trent Jaeger,et al.  High-Performance Caching With The Lava Hit-Server , 1998, USENIX Annual Technical Conference.

[83]  James R. Goodman,et al.  Efficient Synchronization: Let Them Eat QOLB , 1997, International Symposium on Computer Architecture.

[84]  Robert P. Goldberg,et al.  Survey of virtual machine research , 1974, Computer.

[85]  Trent Jaeger,et al.  The SawMill multiserver approach , 2000, EW 9.

[86]  Trent Jaeger,et al.  Flexible access control using IPC redirection , 1999, Proceedings of the Seventh Workshop on Hot Topics in Operating Systems.

[87]  Michael Stumm,et al.  Experiences with locking in a NUMA multiprocessor operating system kernel , 1994, OSDI '94.

[88]  Jochen Liedtke,et al.  On micro-kernel construction , 1995, SOSP.

[89]  Barton P. Miller,et al.  Fine-grained dynamic instrumentation of commodity operating system kernels , 1999, OSDI '99.

[90]  J. Liedtke On -Kernel Construction , 1995 .

[91]  Nawaf Bitar,et al.  A Scalable Multi-Discipline, Multiple-Processor Scheduling Framework for IRIX , 1995, JSSPP.

[92]  William J. Bolosky,et al.  Mach: A New Kernel Foundation for UNIX Development , 1986, USENIX Summer.

[93]  Ken Thompson,et al.  The UNIX time-sharing system , 1974, CACM.

[94]  Laxmi N. Bhuyan,et al.  High-performance computer architecture , 1995, Future Gener. Comput. Syst..

[95]  Robert Wahbe,et al.  Efficient software-based fault isolation , 1994, SOSP '93.

[96]  Andreas Haeberlen,et al.  Performance of address-space multiplexing on the Pentium , 2002 .

[97]  Trent Jaeger,et al.  Achieved IPC performance (still the foundation for extensibility) , 1997, Proceedings. The Sixth Workshop on Hot Topics in Operating Systems (Cat. No.97TB100133).

[98]  Calton Pu,et al.  A Lock-Free Multiprocessor OS Kernel , 1992, OPSR.

[99]  Benjie Chen,et al.  Multiprocessing with the Exokernel Operating System , 2000 .

[100]  Andreas Haeberlen,et al.  Managing Kernel Memory Resources from User Level , 2003 .

[101]  Robert Grimm,et al.  Application performance and flexibility on exokernel systems , 1997, SOSP.

[102]  Jeffrey C. Mogul,et al.  The packer filter: an efficient mechanism for user-level network code , 1987, SOSP '87.