Combining batch execution and leasing using virtual machines

As cluster computers are used for a wider range of applications, we encounter the need to deliver resources at particular times, to meet particular deadlines, and/or at the same time as other resources are provided elsewhere. To address such requirements, we describe a scheduling approach in which users request resource leases, where leases can request either as-soon-as-possible ("best-effort") or reservation start times. We present the design of a lease management architecture, Haizea, that implements leases as virtual machines (VMs), leveraging their ability to suspend, migrate, and resume computations and to provide leased resources with customized application environments. We discuss methods to minimize the overhead introduced by having to deploy VM images before the start of a lease. We also present the results of simulation studies that compare alternative approaches. Using workloads with various mixes of best-effort and advance reservation requests, we compare the performance of our VM-based approach with that of non-VM-based schedulers. We find that a VM-based approach can provide better performance (measured in terms of both total execution time and average delay incurred by best-effort requests) than a scheduler that does not support task pre-emption, and only slightly worse performance than a scheduler that does support task pre-emption. We also compare the impact of different VM image popularity distributions and VM image caching strategies on performance. These results emphasize the importance of VM image caching for the workloads studied and quantify the sensitivity of scheduling performance to VM image popularity distribution.

[1]  Asit Dan,et al.  Web services agreement specification (ws-agreement) , 2004 .

[2]  Mark J. Clement,et al.  The Performance Impact of Advance Reservation Meta-scheduling , 2000, JSSPP.

[3]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.

[4]  C. Kesselman,et al.  Performance Impact of Resource Provisioning on Workflows , 2005 .

[5]  Ian T. Foster,et al.  Resource co-allocation in computational grids , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[6]  Klara Nahrstedt,et al.  A distributed resource management architecture that supports advance reservations and co-allocation , 1999, 1999 Seventh International Workshop on Quality of Service. IWQoS'99. (Cat. No.98EX354).

[7]  Wesley Emeneker,et al.  Increasing Reliability through Dynamic Virtual Clustering , 2006 .

[8]  Joseph Y.-T. Leung,et al.  Handbook of Scheduling: Algorithms, Models, and Performance Analysis , 2004 .

[9]  Daniel C. Stanzione,et al.  Dynamic Virtual Clustering with Xen and Moab , 2006, ISPA Workshops.

[10]  Satoshi Matsuoka,et al.  Model-based Resource Selection for Efficient Virtual Cluster Deployment , 2007, Proceedings of the 2nd International Workshop on Virtualization Technology in Distributed Computing (VTDC '07).

[11]  Larry Rudolph,et al.  Parallel Job Scheduling: Issues and Approaches , 1995, JSSPP.

[12]  Gregory A. Koenig,et al.  Maestro-VC: a paravirtualized execution environment for secure on-demand cluster computing , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[13]  Bernd Freisleben,et al.  Xen and the Art of Cluster Scheduling , 2006, First International Workshop on Virtualization Technology in Distributed Computing (VTDC 2006).

[14]  Dror G. Feitelson,et al.  Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling , 2001, IEEE Trans. Parallel Distributed Syst..

[15]  David E. Irwin,et al.  Sharing Networked Resources with Brokered Leases , 2006, USENIX Annual Technical Conference, General Track.

[16]  Dongyan Xu,et al.  Autonomic Live Adaptation of Virtual Computational Environments in a Multi-Domain Infrastructure , 2006, 2006 IEEE International Conference on Autonomic Computing.

[17]  Rizos Sakellariou,et al.  Advance Reservation Policies for Workflows , 2006, JSSPP.

[18]  Borja Sotomayor,et al.  Virtual Clusters for Grid Communities , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[19]  Warren Smith,et al.  Scheduling with advanced reservations , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[20]  Uwe Schwiegelshohn,et al.  Parallel Job Scheduling - A Status Report , 2004, JSSPP.

[21]  Yong Zhao,et al.  Falkon: a Fast and Light-weight tasK executiON framework , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[22]  Borja Sotomayor,et al.  Overhead Matters: A Model for Virtual Resource Management , 2006, First International Workshop on Virtualization Technology in Distributed Computing (VTDC 2006).

[23]  Borja Sotomayor,et al.  Division of Labor: Tools for Growing and Scaling Grids , 2006, ICSOC.

[24]  Renato J. O. Figueiredo,et al.  VMPlants: Providing and Managing Virtual Machine Execution Environments for Grid Computing , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[25]  Dan Stanzione,et al.  Efficient virtual machine caching in dynamic virtual clusters , 2007 .

[26]  Ivan Beschastnikh,et al.  SPRUCE: A System for Supporting Urgent High-Performance Computing , 2006, Grid-Based Problem Solving Environments.

[27]  Edward Walker,et al.  Creating personal adaptive clusters for managing scientific jobs in a distributed computing environment , 2006, 2006 IEEE Challenges of Large Applications in Distributed Environments.

[28]  Dongyan Xu,et al.  VioCluster: Virtualization for Dynamic Computational Domains , 2005, 2005 IEEE International Conference on Cluster Computing.

[29]  Satoshi Matsuoka,et al.  Virtual Clusters on the Fly - Fast, Scalable, and Flexible Installation , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[30]  Larry Rudolph,et al.  Metrics and Benchmarking for Parallel Job Scheduling , 1998, JSSPP.

[31]  David A. Lifka,et al.  The ANL/IBM SP Scheduling System , 1995, JSSPP.

[32]  W. Cleveland LOWESS: A Program for Smoothing Scatterplots by Robust Locally Weighted Regression , 1981 .

[33]  Jason Duell,et al.  Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters , 2006 .

[34]  Franck Cappello,et al.  Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed , 2006, Int. J. High Perform. Comput. Appl..

[35]  Phil Andrews,et al.  Impact of Reservations on Production Job Scheduling , 2007, JSSPP.

[36]  Ian T. Foster,et al.  Virtual workspaces: Achieving quality of service and quality of life in the Grid , 2005, Sci. Program..

[37]  Xiaomin Zhu,et al.  From virtualized resources to virtual computing grids: the In-VIGO system , 2005, Future Gener. Comput. Syst..