Scalability of parallel scientific applications on the cloud

Cloud computing, with its promise of virtually infinite resources, seems to suit well in solving resource greedy scientific computing problems. To study the effects of moving parallel scientific applications onto the cloud, we deployed several benchmark applications like matrix-vector operations and NAS parallel benchmarks, and DOUG (Domain decomposition On Unstructured Grids) on the cloud. DOUG is an open source software package for parallel iterative solution of very large sparse systems of linear equations. The detailed analysis of DOUG on the cloud showed that parallel applications benefit a lot and scale reasonable on the cloud. We could also observe the limitations of the cloud and its comparison with cluster in terms of performance. However, for efficiently running the scientific applications on the cloud infrastructure, the applications must be reduced to frameworks that can successfully exploit the cloud resources, like the MapReduce framework. Several iterative and embarrassingly parallel algorithms are reduced to the MapReduce model and their performance is measured and analyzed. The analysis showed that Hadoop MapReduce has significant problems with iterative methods, while it suits well for embarrassingly parallel algorithms. Scientific computing often uses iterative methods to solve large problems. Thus, for scientific computing on the cloud, this paper raises the necessity for better frameworks or optimizations for MapReduce.

[1]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[2]  Ewa Deelman,et al.  The cost of doing science on the cloud: the Montage example , 2008, HiPC 2008.

[3]  Ewing Lusk,et al.  Performance visualization for parallel programs , 1993 .

[4]  M. Kunze,et al.  The Cumulus project: Build a scientific cloud for a data center , 2009 .

[5]  Renato Figueiredo,et al.  Science Clouds: Early Experiences in Cloud Computing for Scientific Applications , 2008 .

[6]  Matthias Jarke,et al.  Mobile web services mediation framework , 2007, MW4SOC '07.

[7]  Carl Pomerance,et al.  A Tale of Two Sieves , 1998 .

[8]  Richard Wolski,et al.  The Eucalyptus Open-Source Cloud-Computing System , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[9]  Geoffrey C. Fox,et al.  Twister: a runtime for iterative MapReduce , 2010, HPDC '10.

[10]  Eero Vainikko,et al.  SciCloud: Scientific Computing on the Cloud , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[11]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[12]  Eero Vainikko,et al.  Additive Schwarz with aggregation-based coarsening for elliptic problems with highly variable coefficients , 2007, Computing.

[13]  Matthias Jarke,et al.  Scalable Mobile Web Services Mediation Framework , 2010, 2010 Fifth International Conference on Internet and Web Applications and Services.

[14]  Eero Vainikko,et al.  Robust aggregation-based coarsening for additive Schwarz in the case of highly variable coefficients , 2006 .

[15]  T. Chan,et al.  Domain decomposition algorithms , 1994, Acta Numerica.

[16]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[17]  Geoffrey C. Fox,et al.  High Performance Parallel Computing with Clouds and Cloud Technologies , 2009, CloudComp.

[18]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[19]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[20]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[21]  Leslie G. Valiant,et al.  Bulk synchronous parallel computing-a paradigm for transportable software , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[22]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[23]  Torsten Hoefler,et al.  Towards Efficient MapReduce Using MPI , 2009, PVM/MPI.

[24]  Jonathan Cohen,et al.  Graph Twiddling in a MapReduce World , 2009, Computing in Science & Engineering.

[25]  Matthias Jarke,et al.  Mobile hosts in enterprise service integration , 2009, Int. J. Web Eng. Technol..

[26]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[27]  Michael D. Ernst,et al.  HaLoop , 2010, Proc. VLDB Endow..

[28]  James Snell,et al.  Introduction to Web services architecture , 2002, IBM Syst. J..

[29]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[30]  George Bosilca,et al.  Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[31]  Andrew Warfield,et al.  Xen and the art of virtualization , 2003, SOSP '03.