Evaluating Cloud Storage Services for Tightly-Coupled Applications

The emergence of Cloud computing has given rise to numerous attempts to study the portability of scientific applications to this new paradigm. Tightly-coupled applications are a common class of scientific HPC applications, which exhibit specific requirements previously addressed by supercomputers. A key challenge towards the adoption of the Cloud paradigm for such applications is data management. In this paper, we argue that Cloud storage services represent a suitable data storage and sharing option for Cloud applications. We evaluate a distributed storage plugin for Cumulus, an S3-compatible open-source Cloud service, and we conduct a series of experiments with an atmospheric modeling application running in a private Cloud deployed on the Grid'5000 testbed. Our results, obtained on up to 144 parallel processes, show that the application is able to scale with the size of the data and the number of processes, while storing 50 GB of output data on a Cloud storage service.

[1]  Preston M. Smith,et al.  Cost-Effective HPC: The Community or the Cloud? , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[2]  Franck Cappello,et al.  Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed , 2006, Int. J. High Perform. Comput. Appl..

[3]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[4]  Alexandru Iosup,et al.  A Performance Analysis of EC2 Cloud Computing Services for Scientific Computing , 2009, CloudComp.

[5]  Renato Figueiredo,et al.  Science Clouds: Early Experiences in Cloud Computing for Scientific Applications , 2008 .

[6]  Shantenu Jha,et al.  Exploring the Performance Fluctuations of HPC Workloads on Clouds , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[7]  Robert Ross,et al.  Implementation and performance of a parallel file system for high performance distributed applications , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.

[8]  Wenguang Chen,et al.  Cloud versus in-house cluster: Evaluating Amazon cluster compute instances for running MPI applications , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[9]  Geoffrey C. Fox,et al.  High Performance Parallel Computing with Clouds and Cloud Technologies , 2009, CloudComp.

[10]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[11]  Shujia Zhou,et al.  Case study for running HPC applications in public clouds , 2010, HPDC '10.

[12]  John Bresnahan,et al.  Cumulus: an open source storage cloud for science , 2011, ScienceCloud '11.

[13]  Geoffrey C. Fox,et al.  Analysis of Virtualization Technologies for High Performance Computing Environments , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[14]  Gabriel Antoniu,et al.  BlobSeer: Next-generation data management for large scale infrastructures , 2011, J. Parallel Distributed Comput..

[15]  George H. Bryan,et al.  Evaluation of an Analytical Model for the Maximum Intensity of Tropical Cyclones , 2009 .