Converting a High Performance Application to an Elastic Cloud Application

Over the past decade, high performance applications have embraced parallel programming and computing models. While parallel computing offers advantages such as good utilization of dedicated hardware resources, it also has several drawbacks such as poor fault-tolerance, scalability, and ability to harness available resources during run-time. The advent of cloud computing presents a viable and promising alternative to parallel computing because of its advantages in offering a distributed computing model. In this work, we establish directives that serve as guidelines for the design and implementation or identification of a suitable cloud computing framework to build or convert a high performance application to run in the cloud. We show that following these directives leads to an elastic implementation that has better scalability, run-time resource adaptability, fault tolerance, and portability across cloud computing platforms, while requiring minimal effort and intervention from the user. We illustrate this by converting an MPI implementation of replica exchange, a parallel tempering molecular dynamics application, to an elastic cloud application using the Work Queue framework that adheres to these directive. We observe better scalability and resource adaptability of this elastic application on multiple platforms, including a homogeneous cluster environment (SGE) and heterogeneous cloud computing environments such as Microsoft Azure and Amazon EC2.

[1]  Berk Hess,et al.  GROMACS 3.0: a package for molecular simulation and trajectory analysis , 2001 .

[2]  L. Youseff,et al.  Toward a Unified Ontology of Cloud Computing , 2008, 2008 Grid Computing Environments Workshop.

[3]  Y. Sugita,et al.  Replica-exchange molecular dynamics method for protein folding , 1999 .

[4]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5]  G. Bruce Berriman,et al.  On the Use of Cloud Computing for Scientific Workflows , 2008, 2008 IEEE Fourth International Conference on eScience.

[6]  Wolfgang Gentzsch,et al.  Sun Grid Engine: towards creating a compute power grid , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[7]  Peter S. Pacheco Parallel programming with MPI , 1996 .

[8]  Jie Li,et al.  Early observations on the performance of Windows Azure , 2010, HPDC '10.

[9]  Renato Figueiredo,et al.  Science Clouds: Early Experiences in Cloud Computing for Scientific Applications , 2008 .

[10]  Thierry Matthey,et al.  ProtoMol, an object-oriented framework for prototyping novel algorithms for molecular dynamics , 2004, TOMS.

[11]  Alexandru Iosup,et al.  A Performance Analysis of EC2 Cloud Computing Services for Scientific Computing , 2009, CloudComp.

[12]  Yong Zhao,et al.  Cloud Computing and Grid Computing 360-Degree Compared , 2008, GCE 2008.

[13]  Csaba Andras Moritz,et al.  Performance Modeling and Evaluation of MPI , 2001, J. Parallel Distributed Comput..

[14]  Christopher R Sweet,et al.  Accelerating the replica exchange method through an efficient all-pairs exchange. , 2007, The Journal of chemical physics.

[15]  Peter M. Kasson,et al.  Copernicus: A new paradigm for parallel adaptive molecular dynamics , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[16]  Bettina Schnor,et al.  Distributed Replica-Exchange Simulations on Production Environments Using SAGA and Migol , 2008, 2008 IEEE Fourth International Conference on eScience.

[17]  Nancy Wilkins-Diehr,et al.  Studying protein folding on the Grid: experiences using CHARMM on NPACI resources under Legion , 2004, Concurr. Comput. Pract. Exp..

[18]  Michael M. Resch,et al.  A Comparison of MPI Performance on Different MPPs , 1997, PVM/MPI.

[19]  Douglas Thain,et al.  Harnessing parallelism in multicore clusters with the All-Pairs, Wavefront, and Makeflow abstractions , 2010, Cluster Computing.

[20]  G. Bruce Berriman,et al.  Scientific workflow applications on Amazon EC2 , 2010, 2009 5th IEEE International Conference on E-Science Workshops.

[21]  Laxmikant V. Kalé,et al.  NAMD: Biomolecular Simulation on Thousands of Processors , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[22]  Kaihsu Tai,et al.  Grid computing and biomolecular simulation , 2005, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.