Management of distributed resource allocations in multi-cluster environments

We present a fully distributed solution for managing resource allocation for services running across multiple clusters in a large-scale cloud computing environment. Our solution allows individual services running across clusters to compete dynamically for allocations based on their rate of consumption while maintaining the global cloud level allocation limits. The solution monitors resource consumption by services that are spread over a number of clusters. Global polls are triggered only when the allocated balance in a cluster decreases below a threshold and allocations are reassigned in a manner that avoids further immediate global polls. Our solution achieves scalability by minimizing global message exchanges, increases performance by distributing requests, and improves availability by avoiding a single point of failure. We perform a range of simulations to verify the accuracy of our approach, to validate our theoretical results, and to evaluate against competing approaches.

[1]  Christopher Olston,et al.  Distributed top-k monitoring , 2003, SIGMOD '03.

[2]  Raj Jain,et al.  The Art of Computer Systems Performance Analysis : Tech-niques for Experimental Design , 1991 .

[3]  Assaf Schuster,et al.  GWiQ-P: an efficient decentralized grid-wide quota enforcement protocol , 2005, HPDC-14. Proceedings. 14th IEEE International Symposium on High Performance Distributed Computing, 2005..

[4]  Rajeev Rastogi,et al.  Efficient Detection of Distributed Constraint Violations , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[5]  Willy Zwaenepoel,et al.  Cluster reserves: a mechanism for resource management in cluster-based network servers , 2000, SIGMETRICS '00.

[6]  Erik Elmroth,et al.  A Cloud Environment for Data-intensive Storage Services , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.

[7]  Graham Cormode,et al.  Communication-efficient distributed monitoring of thresholded counts , 2006, SIGMOD Conference.

[8]  George Coulouris,et al.  Distributed systems (3rd ed.): concepts and design , 2000 .

[9]  Bernard W. Taylor,et al.  Introduction to Management Science , 2006 .

[10]  Erik Elmroth,et al.  Scalable Grid-wide capacity allocation with the SweGrid Accounting System (SGAS) , 2008 .

[11]  Darrell D. E. Long,et al.  Quota enforcement for high-performance distributed storage systems , 2007, 24th IEEE Conference on Mass Storage Systems and Technologies (MSST 2007).

[12]  Ashok K. Agrawala,et al.  An optimal algorithm for mutual exclusion in computer networks , 1981, CACM.

[13]  Sriram Ramabhadran,et al.  Cloud control with distributed rate limiting , 2007, SIGCOMM 2007.

[14]  Martin Arlitt,et al.  A workload characterization study of the 1998 World Cup Web site , 2000, IEEE Netw..

[15]  Burkhard Stiller,et al.  An Accounting and Charging Architecture for Mobile Grids April 2006 , 2006 .

[16]  Neil J. Gunther,et al.  The Practical Performance Analyst: Performance-by-Design Techniques for Distributed Systems , 1997 .

[17]  Erik Elmroth,et al.  Accounting and Billing for Federated Cloud Infrastructures , 2009, 2009 Eighth International Conference on Grid and Cooperative Computing.

[18]  W. Marsden I and J , 2012 .

[19]  Burkhard Stiller,et al.  An Integrated Accounting and Charging Architecture for Mobile Grids , 2006, 2006 3rd International Conference on Broadband Communications, Networks and Systems.

[20]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[21]  Lada A. Adamic,et al.  Zipf's law and the Internet , 2002, Glottometrics.

[22]  Shicong Meng,et al.  REMO: Resource-Aware Application State Monitoring for Large-Scale Distributed Systems , 2009, 2009 29th IEEE International Conference on Distributed Computing Systems.