Improving performance by embedding HPC applications in lightweight Xen domains

Although they allow easy and cost-effective use of a wide range of machines, the programming interface and behavior of general-purpose Operating Systems (OS) often fail to meet, or even conflict with, the specific desires of High-Performance Computing (HPC) applications, such as low preemption or control over memory and I/O management. That often leads to poor performance. On the other hand, hypervisors are more and more commonly used on top of those OSes for various reasons, such as ease of dedicated environment deployment or load balancing. In contrast to the usual unix process model, hypervisors provide their guests with kernel-level facilities. In this paper, we show how an HPC application and its execution environment can be embedded within a lightweight guest domain, alongside a domain that runs a conventional OS which is only used for administrative purpose. That permits the execution environment to take advantage of kernel-level facilities to improve performance, which would be hard to achieve in the traditional process model because of lack of support or excessive overhead.

[1]  J. Liedtke /spl mu/-kernels must and can be small , 1996, Proceedings of the Fifth International Workshop on Object-Orientation in Operation Systems.

[2]  Suzanne M. Kelly,et al.  Software Architecture of the Light Weight Kernel, Catamount , 2005 .

[3]  Dickon Reed,et al.  Nemesis, The Kernel - Overview , 1997 .

[4]  T. Inglett,et al.  Designing a Highly-Scalable Operating System: The Blue Gene/L Story , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[5]  Andreas Jacobsen Implementing and Testing the APEX I/O Scheduler in Linux , 2007 .

[6]  Dilma Da Silva,et al.  Libra: a library operating system for a jvm in a virtualized execution environment , 2007, VEE '07.

[7]  Pascal Hénon,et al.  PaStiX: A Parallel Sparse Direct Solver Based on a Static Scheduling for Mixed 1D/2D Block Distributions , 2000, IPDPS Workshops.

[8]  Orran Krieger,et al.  Virtualization for high-performance computing , 2006, OPSR.

[9]  J. Fier,et al.  Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[10]  Dawson R. Engler,et al.  Exokernel: an operating system architecture for application-level resource management , 1995, SOSP.

[11]  Andrew Warfield,et al.  Are Virtual Machine Monitors Microkernels Done Right? , 2005, HotOS.

[12]  Gil Utard,et al.  Adaptive paging for a multifrontal solver , 2004, ICS '04.

[13]  Terry Jones,et al.  Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[14]  Brian N. Bershad,et al.  Scheduler activations: effective kernel support for the user-level management of parallelism , 1991, TOCS.

[15]  Vishakha Gupta,et al.  High-Performance Hypervisor Architectures: Virtualization in HPC Systems , 2007 .

[16]  Eli M. Dow,et al.  Xen and the Art of Repeated Research , 2004, USENIX Annual Technical Conference, FREENIX Track.