Sharc: managing CPU and network bandwidth in shared clusters

We argue the need for effective resource management mechanisms for sharing resources in commodity clusters. To address this issue, we present the design of Sharc-a system that enables resource sharing among applications in such clusters. Sharc depends on single node resource management mechanisms such as reservations or shares, and extends the benefits of such mechanisms to clustered environments. We present techniques for managing two important resources-CPU and network interface bandwidth-on a cluster-wide basis. Our techniques allow Sharc to 1) support reservation of CPU and network interface bandwidth for distributed applications, 2) dynamically allocate resources based on past usage, and 3) provide performance isolation to applications. Our experimental evaluation has shown that Sharc can scale to 256 node clusters running 100,000 applications. These results demonstrate that Sharc can be an effective approach for sharing resources among competing applications in moderate size clusters.

[1]  S. Ranjan,et al.  QoS-driven server migration for Internet data centers , 2002, IEEE 2002 Tenth IEEE International Workshop on Quality of Service (Cat. No.02EX564).

[2]  M LevyHenry,et al.  Manageability, availability and performance in Porcupine , 1999 .

[3]  Sara Sprenkle,et al.  Managing Mixed-Use Clusters with Cluster-on-Demand , 2002 .

[4]  Eric A. Brewer,et al.  Cluster-based scalable network services , 1997, SOSP.

[5]  Anoop Gupta,et al.  Performance isolation: sharing and isolation in shared-memory multiprocessors , 1998, ASPLOS VIII.

[6]  David E. Culler,et al.  Scalable, distributed data structures for internet service construction , 2000, OSDI.

[7]  Andrea C. Arpaci-Dusseau,et al.  Implicit coscheduling: coordinated scheduling with implicit information in distributed systems , 2001, TOCS.

[8]  Vijay Karamcheti,et al.  Enforcing resource sharing agreements among distributed server clusters , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[9]  Yutaka Ishikawa,et al.  Implementation of Gang-Scheduling on Workstation Cluster , 1996, JSSPP.

[10]  Prashant J. Shenoy,et al.  Cello: A Disk Scheduling Framework for Next Generation Operating Systems* , 1998, SIGMETRICS '98/PERFORMANCE '98.

[11]  Klara Nahrstedt,et al.  A soft real-time scheduling server on the Windows NT , 1998 .

[12]  Peter Druschel,et al.  Resource containers: a new facility for resource management in server systems , 1999, OSDI '99.

[13]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[14]  Harrick M. Vin,et al.  Determining end-to-end delay bounds in heterogeneous networks , 1997, Multimedia Systems.

[15]  David R. Cheriton,et al.  Borrowed-virtual-time (BVT) scheduling: supporting latency-sensitive threads in a general-purpose scheduler , 1999, OPSR.

[16]  Harrick M. Vin,et al.  Start-time fair queueing: a scheduling algorithm for integrated services packet switching networks , 1996, SIGCOMM 1996.

[17]  Willy Zwaenepoel,et al.  Scalable Content-aware Request Distribution in Cluster-based Network Servers , 2000, USENIX Annual Technical Conference, General Track.

[18]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[19]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[20]  Kang G. Shin,et al.  Virtual Services: A New Abstraction for Server Consolidation , 2000, USENIX Annual Technical Conference, General Track.

[21]  Benny Rochwerger,et al.  Oceano-SLA based management of a computing utility , 2001, 2001 IEEE/IFIP International Symposium on Integrated Network Management Proceedings. Integrated Network Management VII. Integrated Management Strategies for the New Millennium (Cat. No.01EX470).

[22]  Amin Vahdat,et al.  Managing energy and server resources in hosting centers , 2001, SOSP.

[23]  Andrea C. Arpaci-Dusseau,et al.  Extending Proportional-Share Scheduling to a Network of Workstation , 1997, PDPTA.

[24]  Prashant J. Shenoy,et al.  Resource overbooking and application profiling in shared hosting platforms , 2002, OSDI '02.

[25]  Harrick M. Vin,et al.  A hierarchial CPU scheduler for multimedia operating systems , 1996, OSDI '96.

[26]  Robin Fairbairns,et al.  The Design and Implementation of an Operating System to Support Distributed Multimedia Applications , 1996, IEEE J. Sel. Areas Commun..

[27]  Abraham Silberschatz,et al.  Resource Management for QoS in Eclipse/BSD , 1999 .

[28]  Mendel Rosenblum,et al.  Cellular disco: resource management using virtual clusters on shared-memory multiprocessors , 2000, TOCS.

[29]  Willy Zwaenepoel,et al.  Cluster reserves: a mechanism for resource management in cluster-based network servers , 2000, SIGMETRICS '00.

[30]  Michael B. Jones,et al.  CPU reservations and time constraints: efficient, predictable scheduling of independent activities , 1997, SOSP.

[31]  Harrick M. Vin,et al.  Start-time fair queueing: a scheduling algorithm for integrated services packet switching networks , 1996, SIGCOMM '96.

[32]  Carl A. Waldspurger,et al.  Memory resource management in VMware ESX server , 2002, OSDI '02.

[33]  Timothy Roscoe,et al.  Distributing processing without DPEs: design considerations for public computing platforms , 2000, EW 9.