Rapid Node Reallocation Between Virtual Clusters for Data Intensive Utility Computing

Utility computing achieves efficiencies by dynamically reallocating shared resources between services operating on virtual clusters. These efficiencies can be hard to realize for data intensive applications; newly allocated nodes must be populated with a large amount of data which impedes rapid node reallocation. We describe a data management architecture that uses disk caches on each node to reduce data copying and speed up node reallocation for data intensive applications. Cache consistency management is simplified by extensive use of copy-on-write techniques. A data-driven scheme is then used to select nodes for reallocation between virtual clusters based on the amount of relevant cached data. These nodes are identified using a novel technique of statistically sampling the contents of caches. We demonstrate the benefits of this architecture using our implementation of an efficient block level caching and copy-on-write target for the Linux device-mapper framework

[1]  Dongyan Xu,et al.  VioCluster: Virtualization for Dynamic Computational Domains , 2005, 2005 IEEE International Conference on Cluster Computing.

[2]  Miron Livny,et al.  Local Disk Caching for Client-Server Database Systems , 1993, VLDB.

[3]  J. Rolia,et al.  Adaptive Internet Data Centers , 2000 .

[4]  Renato J. O. Figueiredo,et al.  Distributed file system support for virtual machines in grid computing , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[5]  James Norris,et al.  OnCall: defeating spikes with a free-market application cluster , 2004, International Conference on Autonomic Computing, 2004. Proceedings..

[6]  Michael A. Rappa,et al.  The utility business model and the future of computing services , 2004, IBM Syst. J..

[7]  Renato J. O. Figueiredo,et al.  A case for grid computing on virtual machines , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[8]  Remzi H. Arpaci-Dusseau,et al.  Gathering at the Well: Creating Communities for Grid I/O , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[9]  Kave Eshghi Intrinsic references in distributed systems , 2002, Proceedings 22nd International Conference on Distributed Computing Systems Workshops.

[10]  Abhishek Chandra,et al.  Quantifying the Benefits of Resource Multiplexing in On-Demand Data Centers , 2003 .

[11]  Miron Livny,et al.  Utilizing widely distributed computational resources efficiently with execution domains , 2001 .

[12]  Benny Rochwerger,et al.  Oceano-SLA based management of a computing utility , 2001, 2001 IEEE/IFIP International Symposium on Integrated Network Management Proceedings. Integrated Network Management VII. Integrated Management Strategies for the New Millennium (Cat. No.01EX470).

[13]  Kavitha Ranganathan,et al.  Decoupling computation and data scheduling in distributed data-intensive applications , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[14]  Chris I. Dalton,et al.  SoftUDC: a software-based data center for utility computing , 2004, Computer.

[15]  Sean Quinlan,et al.  Venti: A New Approach to Archival Storage , 2002, FAST.

[16]  Xuxian Jiang,et al.  SODA: a service-on-demand architecture for application service hosting utility platforms , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[17]  Wu-chun Feng,et al.  The design, implementation, and evaluation of mpiBLAST , 2003 .

[18]  Monica S. Lam,et al.  Optimizing the migration of virtual computers , 2002, OPSR.

[19]  Mark Rae,et al.  The Ensembl computing architecture. , 2004, Genome research.

[20]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.