Harnessing Virtual Machine Resource Control for Job Management

Virtual machine technology promises important benefits for grid computing and cluster batch job systems, including improved isolation, customizable workspaces, and support for checkpointing and migration. One way to gain these benefits is to “drill holes” in existing batch computing systems; however, we believe these new capabilities warrant a rethinking of the architectures of exist ing systems. We propose separating resource control for VMs into a new foundational layer that focuses narrowly on resource management. We present JAWS, a new batch computing service that is built as a thin-layer above a resource control plane that enables it to share a common pool of networked cluster resources with other cluster services. JAWS executes jobs within isolated virtual machine workspaces. We discuss how exposing resource control allows JAWS to leverage VM-based resource isolation as a means to learn models of application behavior, and use those models to guide scheduling policies for efficient resource

[1]  W. J. Studden,et al.  Theory Of Optimal Experiments , 1972 .

[2]  Raj Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[3]  A. Gilles,et al.  The Art of Computer Systems Performance Analysis (Techniques for Experimental Design, Measurement, Simulation, and Modeling) , 1992 .

[4]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[5]  Jeffrey Scott Vitter,et al.  Flow computation on massive grids , 2001, GIS '01.

[6]  Donald J. Rose,et al.  Large-scale modeling of cardiac electrophysiology , 2002, Computers in Cardiology.

[7]  Renato J. O. Figueiredo,et al.  A case for grid computing on virtual machines , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[8]  David E. Irwin,et al.  Dynamic virtual clusters in a grid site manager , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[9]  Ian T. Foster,et al.  From sandbox to playground: dynamic virtual environments in the grid , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[10]  Cynthia Bailey Lee,et al.  Are User Runtime Estimates Inherently Inaccurate? , 2004, JSSPP.

[11]  Peter A. Dinda,et al.  Towards Virtual Networks for Virtual Machine Grid Computing , 2004, Virtual Machine Research and Technology Symposium.

[12]  Renato J. O. Figueiredo,et al.  VMPlants: Providing and Managing Virtual Machine Execution Environments for Grid Computing , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[13]  Xuxian Jiang,et al.  VIOLIN: Virtual Internetworking on Overlay Infrastructure , 2004, ISPA.

[14]  Andrea C. Arpaci-Dusseau,et al.  Deploying Virtual Machines as Sandboxes for the Grid , 2005, WORLDS.

[15]  Laxmikant V. Kalé,et al.  Scalable molecular dynamics with NAMD , 2005, J. Comput. Chem..

[16]  Peter A. Dinda,et al.  VSched: Mixing Batch And Interactive Virtual Machines Using Periodic Real-time Scheduling , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[17]  Xuxian Jiang,et al.  Virtual distributed environments in a shared infrastructure , 2005, Computer.

[18]  Borja Sotomayor,et al.  Virtual Clusters for Grid Communities , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[19]  Amin Vahdat,et al.  PlanetLab application management using plush , 2006, OPSR.

[20]  L. Ramakrishnan,et al.  Toward a Doctrine of Containment: Grid Hosting with Adaptive Resource Control , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[21]  Jeffrey S. Chase,et al.  Active and accelerated learning of cost models for optimizing scientific applications , 2006, VLDB.

[22]  Dhabaleswar K. Panda,et al.  A case for high performance computing with virtual machines , 2006, ICS '06.

[23]  David E. Irwin,et al.  Sharing Networked Resources with Brokered Leases , 2006, USENIX Annual Technical Conference, General Track.

[24]  David J. DeWitt,et al.  Turning Cluster Management into Data Management; A System Overview , 2006, CIDR.