Elastic Cloud Caches for Accelerating Service-Oriented Computations

Computing as a utility, that is, on-demand access to computing and storage infrastructure, has emerged in the form of the Cloud. In this model of computing, elastic resource allocation, i.e., the ability to scale resource allocation for specific applications, should be optimized to manage cost versus performance. Meanwhile, the wake of the information sharing/mining age is invoking a pervasive sharing of Web services and data sets in the Cloud, and at the same time, many data-intensive scientific applications are being expressed as these services. In this paper, we explore an approach to accelerate service processing in a Cloud setting. We have developed a cooperative scheme for caching data output from services for reuse. We propose algorithms for scaling our cache system up during peak querying times, and back down to save costs. Using the Amazon EC2 public Cloud, a detailed evaluation of our system has been performed, considering speed up and elastic scalability in terms resource allocation and relaxation.

[1]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[2]  Jason Lee,et al.  Distributed parallel data storage systems: a scalable approach to high speed image servers , 1994, MULTIMEDIA '94.

[3]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[4]  I. Foster,et al.  Service-Oriented Science , 2005, Science.

[5]  G. Bruce Berriman,et al.  On the Use of Cloud Computing for Scientific Workflows , 2008, 2008 IEEE Fourth International Conference on eScience.

[6]  Corina Bassi Query and Update Efficient B + -Tree Based Indexing of Moving Objects , 2010 .

[7]  Yogesh Simmhan,et al.  Building the Trident Scientific Workflow Workbench for Data Management in the Cloud , 2009, 2009 Third International Conference on Advanced Engineering Computing and Applications in Sciences.

[8]  Douglas Thain,et al.  Positioning Dynamic Storage Caches for Transient Data , 2006, 2006 IEEE International Conference on Cluster Computing.

[9]  Yong Zhao,et al.  Chimera: a virtual data system for representing, querying, and automating data derivation , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[10]  Warren Smith,et al.  Scheduling with advanced reservations , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[11]  Rajkumar Buyya,et al.  High-Performance Cloud Computing: A View of Scientific Applications , 2009, 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks.

[12]  Yong Zhao,et al.  Falkon: a Fast and Light-weight tasK executiON framework , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[13]  Vijay Kumar,et al.  Semantic Caching and Query Processing , 2003, IEEE Trans. Knowl. Data Eng..

[14]  Borja Sotomayor,et al.  Combining batch execution and leasing using virtual machines , 2008, HPDC '08.

[15]  Carl Kesselman,et al.  A provisioning model and its comparison with best-effort for performance-cost optimization in grids , 2007, HPDC '07.

[16]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[17]  Doron Rotem,et al.  File Caching in Data Intensive Scientific Applications on Data-Grids , 2005, DMG.

[18]  I. Melzer Web Services Description Language , 2010 .

[19]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[20]  Joel H. Saltz,et al.  Active semantic caching to optimize multidimensional data analysis in parallel and distributed environments , 2007, Parallel Comput..

[21]  Song Jiang,et al.  Efficient distributed disk caching in data grid management , 2003, 2003 Proceedings IEEE International Conference on Cluster Computing.

[22]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[23]  Roxana Geambasu,et al.  CloudViews: Communal Data Sharing in Public Clouds , 2009, HotCloud.

[24]  Ben Y. Zhao,et al.  Tapestry: a resilient global-scale overlay for service deployment , 2004, IEEE Journal on Selected Areas in Communications.

[25]  David Thaler,et al.  Using name-based mappings to increase hit rates , 1998, TNET.

[26]  Rajesh Raman,et al.  Matchmaking: An extensible framework for distributed resource management , 1999, Cluster Computing.

[27]  Syam Gadde,et al.  Reduce, reuse, recycle: an approach to building large Internet caches , 1997, Proceedings. The Sixth Workshop on Hot Topics in Operating Systems (Cat. No.97TB100133).

[28]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[29]  Ewa Deelman,et al.  Resource Provisioning Options for Large-Scale Scientific Workflows , 2008, 2008 IEEE Fourth International Conference on eScience.

[30]  Jon B. Weissman,et al.  Using Proxies to Accelerate Cloud Applications , 2009, HotCloud.

[31]  Reagan Moore,et al.  The SDSC storage resource broker , 2010, CASCON.

[32]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[33]  Gagan Agrawal,et al.  Evaluating caching and storage options on the Amazon Web Services Cloud , 2010, 2010 11th IEEE/ACM International Conference on Grid Computing.

[34]  Eyal de Lara,et al.  SnowFlock: rapid virtual machine cloning for cloud computing , 2009, EuroSys '09.

[35]  Gagan Agrawal,et al.  Composing geoinformatics workflows with user preferences , 2008, GIS '08.

[36]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[37]  Andrew A. Chien,et al.  Automatic resource specification generation for resource selection , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[38]  Jano I. van Hemert,et al.  The Circulate architecture: avoiding workflow bottlenecks caused by centralised orchestration , 2009, Cluster Computing.

[39]  Peter Druschel,et al.  Pastry: Scalable, distributed object location and routing for large-scale peer-to- , 2001 .

[40]  Rudolf Bayer,et al.  Organization and maintenance of large ordered indexes , 1972, Acta Informatica.

[41]  David R. Karger,et al.  Web Caching with Consistent Hashing , 1999, Comput. Networks.

[42]  Lionel Brunie,et al.  Uniform Distributed Cache Service for Grid Computing , 2005, 16th International Workshop on Database and Expert Systems Applications (DEXA'05).

[43]  Eugene Ciurana,et al.  Google App Engine , 2009 .

[44]  Ian T. Foster,et al.  Globus Toolkit Version 4: Software for Service-Oriented Systems , 2005, Journal of Computer Science and Technology.

[45]  Jason Lee,et al.  Using High-Speed WANs and Network Data Caches to Enable Remote and Distributed Visualization , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[46]  Andrew Warfield,et al.  Live migration of virtual machines , 2005, NSDI.

[47]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.

[48]  Gagan Agrawal,et al.  Cost and accuracy sensitive dynamic workflow composition over grid environments , 2008, 2008 9th IEEE/ACM International Conference on Grid Computing.