Argo: An Exascale Operating System and Runtime

Exascale supercomputers are expected to comprise hundreds of thousands of heterogeneous compute nodes linked by complex networks. Those compute nodes will have an intricate mix of general-purpose multi-cores and special-purpose accelerators targeting compute-intensive workloads with deep multi-level memory hierarchies. As such, the HPC community expects exascale systems to require new programming models, to take advantage of both intra-node and inter-node parallelism. The Argo project, funded under the DOE ExaOSR initiative, aims to provide an Operating System and Runtime (OS/R) designed to support extreme-scale scientific computations. With this goal in mind, Argo seeks to efficiently exploit new processor, memory and interconnect technologies while addressing the new modalities, programming environments, and workflows expected at exascale. At the heart of this project are four key innovations: dynamic reconfiguring of node resources in response to workload changes, allowance for massive concurrency, a hierarchical framework for management of nodes, and a cross-layer communication infrastructure that allows resource managers and optimizers to communicate efficiently across the platform. These innovations will result in an open-source prototype system that is expected to form the basis of production exascale systems deployed in the 2020 timeframe. We provide here a overall description of the project, before highlighting recent achievements in performance and integration with existing systems.