An automated approach to cloud storage service selection

We present a new, automated approach to selecting the cloud storage service that best matches each dataset of a given application. Our approach relies on a machine readable description of the capabilities (features, performance, cost, etc.) of each storage system, which is processed together with the user's specified requirements. The result is an assignment of datasets to storage systems, that has multiple advantages: the resulting match meets performance requirements and estimates cost; users express their storage needs using high-level concepts rather than reading the documentation from different cloud providers and manually calculating or estimating a solution. Together with our storage capabilities XML schema we present different use cases for our system that evaluate the Amazon, Azure and local clouds under several scenarios: choosing cloud storage services for a new application, estimating cost savings by switching storage services, estimating the evolution over time of cost and performance and providing information in an Amazon EC2 to Eucalyptus migration. Our application is able to process each use case in under 70 ms; it is also possible to easily expand it to account for new features and data requirements.

[1]  Borja Sotomayor,et al.  Virtual Infrastructure Management in Private and Hybrid Clouds , 2009, IEEE Internet Computing.

[2]  Prashant Pandey,et al.  Cloud Analytics: Do We Really Need to Reinvent the Storage Stack? , 2009, HotCloud.

[3]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[4]  Anne E. Trefethen,et al.  The Data Deluge: An e-Science Perspective , 2003 .

[5]  Miron Livny,et al.  The cost of doing science on the cloud: The Montage example , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[6]  Ian T. Foster,et al.  Replica selection in the Globus Data Grid , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[7]  Yogesh Simmhan,et al.  Bridging the Gap between the Cloud and an eScience Application Platform , 2009 .

[8]  Matei Ripeanu,et al.  Amazon S3 for science grids: a viable solution? , 2008, DADC '08.

[9]  Richard Wolski,et al.  The Eucalyptus Open-Source Cloud-Computing System , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[10]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[11]  Amr El Abbadi,et al.  ElasTraS: An Elastic Transactional Data Store in the Cloud , 2009, HotCloud.

[12]  Rafael Moreno-Vozmediano,et al.  Elastic management of cluster-based services in the cloud , 2009, ACDC '09.

[13]  Rajesh Raman,et al.  Matchmaking: distributed resource management for high throughput computing , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[14]  Jie Li,et al.  Bridging the Gap between Desktop and the Cloud for eScience Applications , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[15]  Jie Li,et al.  Early observations on the performance of Windows Azure , 2010, HPDC '10.

[16]  Franck Cappello,et al.  Cost-benefit analysis of Cloud Computing versus desktop grids , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[17]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[18]  Lin Xiao,et al.  In Search of an API for Scalable File Systems: Under the Table or Above It? , 2009, HotCloud.