The Case for a Factored Operating System (fos)

The next decade will afford us computer chips with 1,000 10,000 cores on a single piece of silicon. Contemporary operating systems have been designed to operate on a single core or small number of cores and hence are not well suited to manage and provide operating system services at such large scale. Managing 10,000 cores is so fundamentally different from managing two cores that the traditional evolutionary approach of operating system optimization will cease to work. The fundamental design of operating systems and operating system data structures must be rethought. This work begins by documenting the scalability problems of contemporary operating systems. These studies are used to motivate the design of a factored operating system (fos). fos is a new operating system targeting 1000+ core multicore systems where space sharing replaces traditional time sharing to increase scalability. fos is built as a collection of Internet inspired services. Each operating system service is factored into a fleet of communicating servers which in aggregate implement a system service. These servers are designed much in the way that distributed Internet services are designed, but instead of providing high level Internet services, these servers provide traditional kernel services and manage traditional kernel data structures in a factored, spatially distributed manner. The servers are bound to distinct processing cores and by doing so do not fight with end user applications for implicit resources such as TLBs and caches. Also, spatial distribution of these OS services facilitates locality as many operations only need to communicate with the nearest server for a given service.

[1]  Mahadev Satyanarayanan,et al.  Scalable, secure, and highly available distributed file access , 1990, Computer.

[2]  Robbert van Renesse,et al.  Using Sparse Capabilities in a Distributed Operating System , 1986, ICDCS.

[3]  William J. Bolosky,et al.  Mach: A New Kernel Foundation for UNIX Development , 1986, USENIX Summer.

[4]  Henry Hoffmann,et al.  On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.

[5]  Scott Devine,et al.  Disco: running commodity operating systems on scalable multiprocessors , 1997, TOCS.

[6]  Anoop Gupta,et al.  Hive: fault containment for shared-memory multiprocessors , 1995, SOSP.

[7]  Dan Hildebrand,et al.  An Architectural Overview of QNX , 1992, USENIX Workshop on Microkernels and Other Kernel Architectures.

[8]  Andrew R. Cherenson,et al.  The Sprite network operating system , 1988, Computer.

[9]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[10]  GhemawatSanjay,et al.  The Google file system , 2003 .

[11]  Luiz André Barroso,et al.  Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[12]  Vikas Agarwal,et al.  Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[13]  Dilma Da Silva,et al.  Experience with K42, an open-source, Linux-compatible, scalable operating-system kernel , 2005, IBM Syst. J..

[14]  Saurabh Dighe,et al.  An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[15]  Henry Hoffmann,et al.  Evaluation of the Raw microprocessor: an exposed-wire-delay architecture for ILP and streams , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[16]  Dilma Da Silva,et al.  Enabling Scalable Performance for General Purpose Workloads on Shared Memory Multiprocessors , 2003 .

[17]  Partha Dasgupta,et al.  The Design and Implementation of the Clouds Distributed Operating System , 1989, Comput. Syst..

[18]  Michael Stumm,et al.  Tornado: maximizing locality and concurrency in a shared memory multiprocessor operating system , 1999, OSDI '99.

[19]  Vivek Sarkar,et al.  Baring It All to Software: Raw Machines , 1997, Computer.

[20]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[21]  Mendel Rosenblum,et al.  Cellular disco: resource management using virtual clusters on shared-memory multiprocessors , 2000, TOCS.

[22]  Robbert van Renesse,et al.  The Amoeba distributed operating system - A status report , 1991, Comput. Commun..