Your computer is already a distributed system. Why isn't your OS?

We argue that a new OS for a multicore machine should be designed ground-up as a distributed system, using concepts from that field. Modern hardware resembles a networked system even more than past large multiprocessors: in addition to familiar latency effects, it exhibits node heterogeneity and dynamic membership changes. Cache coherence protocols encourage OS designers to selectively ignore this, except for limited performance reasons. Commodity OS designs have mostly assumed fixed, uniform CPU architectures and memory systems. It is time for researchers to abandon this approach and engage fully with the distributed nature of the machine, carefully applying (but also modifying) ideas from distributed systems to the design of new operating systems. Our goal is to make it easier to design and construct robust OSes that effectively exploit heterogeneous, multicore hardware at scale. We approach this through a new OS architecture resembling a distributed system. The use of heterogeneous multicore in commodity computer systems, running dynamic workloads with increased reliance on OS services, will face new challenges not addressed by monolithic OSes in either generalpurpose or high-performance computing. It is possible that existing OS architectures can be evolved and scaled to address these challenges. However, we claim that stepping back and reconsidering OS structure is a better way to get insight into the problem, regardless of whether the goal is to retrofit new ideas to existing systems, or to replace them over time. In the next section we elaborate on why modern computers should be thought of as networked systems. Section 3 discusses the implications of distributed systems principles that are pertinent to new OS architecture, and Section 4 describes a possible architecture for an OS following these ideas. Section 5 lays out open questions at the intersection of distributed systems and OS research, and Section 6 concludes. L3

[1]  Scott Devine,et al.  Disco: running commodity operating systems on scalable multiprocessors , 1997, TOCS.

[2]  Pat Conway,et al.  The AMD Opteron Northbridge Architecture , 2007, IEEE Micro.

[3]  Yang Zhang,et al.  Corey: An Operating System for Many Cores , 2008, OSDI.

[4]  Dilma Da Silva,et al.  Experience distributing objects in an SMMP OS , 2007, TOCS.

[5]  Gernot Heiser,et al.  Hype and Virtue , 2007, HotOS.

[6]  Volkmar Uhlig,et al.  Scalability of microkernel-based systems , 2005 .

[7]  Brian N. Bershad,et al.  Improving the reliability of commodity operating systems , 2005, TOCS.

[8]  Alan L. Cox,et al.  TreadMarks: shared memory computing on networks of workstations , 1996 .

[9]  Brian N. Bershad,et al.  User-level interprocess communication for shared memory multiprocessors , 1991, TOCS.

[10]  Michael L. Scott,et al.  Kernel-Kernel communication in a shared-memory multiprocessor , 1993, Concurr. Pract. Exp..

[11]  Michael Stumm,et al.  Tornado: maximizing locality and concurrency in a shared memory multiprocessor operating system , 1999, OSDI '99.

[12]  Martin J. Bligh,et al.  Linux on NUMA Systems , 2010 .

[13]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[14]  Adrian Schüpbach,et al.  Embracing diversity in the Barrelfish manycore operating system , 2008 .

[15]  Anant Agarwal,et al.  Factored operating systems (fos): the case for a scalable operating system for multicores , 2009, OPSR.

[16]  Jean-pierre Deschrevel The ansa model for trading and federation , 1993 .

[17]  Roger M. Needham,et al.  On the duality of operating system structures , 1979, OPSR.