System support for many task computing

The popularity of large scale systems such as Blue Gene has extended their reach beyond HPC into the realm of commercial computing. There is a desire in both communities to broaden the scope of these machines from tightly-coupled scientific applications running on MPI frameworks to more general-purpose workloads. Our approach deals with issues of scale by leveraging the huge number of nodes to distribute operating systems services and components across the machine, tightly coupling the operating system and the interconnects to take maximum advantage of the unique capabilities of the HPC system. We plan on provisioning nodes to provide workload execution, aggregation, and system services, and dynamically re-provisioning nodes as necessary to accommodate changes, failure, and redundancy. By incorporating aggregation as a first-class system construct, we will provide dynamic hierarchical organization and management of all system resources. In this paper, we will go into the design principles of our approach using file systems, workload distribution and system monitoring as illustrative examples. Our end goal is to provide a cohesive distributed system which can broaden the class of applications for large scale systems and also make them more approachable for a larger class of developers and end users.

[1]  George L.-T. Chiu,et al.  Overview of the Blue Gene/L system architecture , 2005, IBM J. Res. Dev..

[2]  B.P. Miller,et al.  MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[3]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[4]  Jonathan Appavoo,et al.  Clustered Objects , 2005 .

[5]  Ken Thompson,et al.  Plan 9 from Bell Labs , 1995 .

[6]  Charles Clos,et al.  A study of non-blocking switching networks , 1953 .

[7]  Zhao Zhang,et al.  Enabling Loosely-Coupled Serial Job Execution on the IBM BlueGene/P Supercomputer and the SiCortex SC5832 , 2008, ArXiv.

[8]  Zhao Zhang,et al.  Towards Loo on , 2008 .

[9]  Philip Winterbottom,et al.  The Inferno™ operating system , 1997, Bell Labs Technical Journal.

[10]  Ronald Minnich,et al.  Supermon: a high-speed cluster monitoring system , 2002, Proceedings. IEEE International Conference on Cluster Computing.

[11]  Julien Bernard,et al.  Processor-Oblivious Parallel Stream Computations , 2008, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008).

[12]  Ronald Minnich,et al.  Right-weight kernels: an off-the-shelf alternative to custom light-weight kernels , 2006, OPSR.

[13]  Russell Glen Ross,et al.  Cluster storage for commodity computation , 2007 .

[14]  Ronald Minnich,et al.  XCPU: a new, 9p-based, process management system for clusters and grids , 2006, 2006 IEEE International Conference on Cluster Computing.

[15]  Jonathan Appavoo,et al.  Project Kittyhawk: building a global-scale computer: Blue Gene/P as a generic computing platform , 2008, OPSR.

[16]  Burton J. Smith The quest for general-purpose parallel computing , 1994 .

[17]  Zhao Zhang,et al.  Toward loosely coupled programming on petascale systems , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[18]  C H Forsyth The Ubiquitous File Server in Plan 9 , 2005 .