mOS: an architecture for extreme-scale operating systems

Linux®, or more specifically, the Linux API, plays a key role in HPC computing. Even for extreme-scale computing, a known and familiar API is required for production machines. However, an off-the-shelf Linux distribution faces challenges at extreme scale. To date, two approaches have been used to address the challenges of providing an operating system (OS) at extreme scale. In the Full-Weight Kernel (FWK) approach, an OS, typically Linux, forms the starting point, and work is undertaken to remove features from the environment so that it will scale up across more cores and out across a large cluster. A Light-Weight Kernel (LWK) approach often starts with a new kernel and work is undertaken to add functionality to provide a familiar API, typically Linux. Either approach however, results in an execution environment that is not fully Linux compatible. mOS (multi Operating System) runs both an FWK (Linux), and an LWK, simultaneously as kernels on the same compute node. mOS thereby achieves the scalability and reliability of LWKs, while providing the full Linux functionality of an FWK. Further, mOS works in concert with Operating System Nodes (OSNs) to offload system calls, e.g., I/O, that are too invasive to run on the compute nodes at extreme-scale. Beyond providing full Linux capability with LWK performance, other advantages of mOS include the ability to effectively manage different types of compute and memory resources, interface easily with proposed asynchronous and fine-grained runtimes, and nimbly manage new technologies. This paper is an architectural description of mOS. As a prototype is not yet finished, the contributions of this work are a description of mOS's architecture, an exploration of the tradeoffs and value of this approach for the purposes listed above, and a detailed architecture description of each of the six components of mOS, including the tradeoffs we considered. The uptick of OS research work indicates that many view this as an important area for getting to extreme scale. Thus, most importantly, the goal of the paper is to generate discussion in this area at the workshop.

[1]  Suzanne M. Kelly,et al.  LDRD Final Report: A Lightweight Operating System for Multi-core Capability Class Supercomputers , 2010 .

[2]  Yutaka Ishikawa,et al.  Direct MPI Library for Intel Xeon Phi Co-Processors , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[3]  Lilian Omolo,et al.  "To Upgrade or Not to Upgrade?" : A comparative study of Adobe CS5 a CS6 software , 2013 .

[4]  Stephen A. Jarvis,et al.  To upgrade or not to upgrade? Catamount vs. Cray Linux Environment , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[5]  Kevin Klues,et al.  Tessellation: space-time partitioning in a manycore client OS , 2009 .

[6]  Yoonho Park,et al.  FusedOS: Fusing LWK Performance with FWK Functionality in a Heterogeneous Environment , 2012, 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing.

[7]  Suzanne M. Kelly,et al.  Software Architecture of the Light Weight Kernel, Catamount , 2005 .

[8]  Dilma Da Silva,et al.  K42: building a complete operating system , 2006, EuroSys.

[9]  Dilma Da Silva,et al.  Libra: a library operating system for a jvm in a virtualized execution environment , 2007, VEE '07.

[10]  David E. Bernholdt,et al.  Hobbes: composition and virtualization as the foundations of an extreme-scale OS/R , 2013, ROSS '13.

[11]  Ronald Minnich,et al.  NIX: A case for a manycore system for cloud computing , 2012, Bell Labs Technical Journal.

[12]  Rolf Riesen,et al.  Designing and implementing lightweight kernels for capability computing , 2009 .

[13]  Sameer Kumar,et al.  Evaluating the effect of replacing CNK with linux on the compute-nodes of blue gene/l , 2008, ICS '08.

[14]  Mark Giampapa,et al.  Experiences with a Lightweight Supercomputer Kernel: Lessons Learned from Blue Gene's CNK , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[15]  Yutaka Ishikawa,et al.  Partially Separated Page Tables for Efficient Operating System Assisted Hierarchical Memory Management on Heterogeneous Architectures , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.