Evaluating caching and storage options on the Amazon Web Services Cloud

With the promise on-demand compute/storage resources, many users are deploying data-intensive scientific applications onto Clouds. To accelerate these applications, the prospect of caching intermediate data using the elastic compute and storage framework has proved promising. To this end, we believe that an in-depth study of cache placement decisions over various Cloud storage options would be highly beneficial to a large class of users. While tangential analyses have been proposed, ours in contrast focuses on cost-performance tradeoffs of maintaining a data cache with various parameters of any Cloud application. We have compared several Amazon Web Service (AWS Cloud) resources as possible cache placements and found that application dependent attributes like unit-data size, total cache size, and persistence, have far reaching implications on the cost of cache sustenance. Moreover, while instance-based caches expectedly yield higher cost, the performance that they afford may outweigh lower cost options.

[1]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[2]  Xiao Liu,et al.  A cost-effective strategy for intermediate data storage in scientific cloud workflow systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[3]  Yogesh Simmhan,et al.  Building the Trident Scientific Workflow Workbench for Data Management in the Cloud , 2009, 2009 Third International Conference on Advanced Engineering Computing and Applications in Sciences.

[4]  Jon B. Weissman,et al.  Using Proxies to Accelerate Cloud Applications , 2009, HotCloud.

[5]  Matei Ripeanu,et al.  Amazon S3 for science grids: a viable solution? , 2008, DADC '08.

[6]  G. Bruce Berriman,et al.  On the Use of Cloud Computing for Scientific Workflows , 2008, 2008 IEEE Fourth International Conference on eScience.

[7]  Eugene Ciurana,et al.  Google App Engine , 2009 .

[8]  Gagan Agrawal,et al.  Elastic Cloud Caches for Accelerating Service-Oriented Computations , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Rajkumar Buyya,et al.  High-Performance Cloud Computing: A View of Scientific Applications , 2009, 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks.

[10]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[11]  Shankar Pasupathy,et al.  Maximizing Efficiency by Trading Storage for Computation , 2009, HotCloud.

[12]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[13]  Gagan Agrawal,et al.  Hierarchical Caches for Grid Workflows , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[14]  Franck Cappello,et al.  Cost-benefit analysis of Cloud Computing versus desktop grids , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[15]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[16]  Jie Li,et al.  eScience in the cloud: A MODIS satellite data reprojection and reduction pipeline in the Windows Azure platform , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[17]  Miron Livny,et al.  The cost of doing science on the cloud: The Montage example , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.