HPC on Competitive Cloud Resources

Computing as a utility has reached the mainstream. Scientists can now easily rent time on large commercial clusters that can be expanded and reduced on-demand in real-time. However, current commercial cloud computing performance falls short of systems specifically designed for scientific applications. Scientific computing needs are quite different from those of the web applications that have been the focus of cloud computing vendors. In this chapter we demonstrate through empirical evaluation the computational efficiency of high-performance numerical applications in a commercial cloud environment when resources are shared under high contention. Using the Linpack benchmark as a case study, we show that cache utilization becomes highly unpredictable and similarly affects computation time. For some problems, not only is it more efficient to underutilize resources, but the solution can be reached sooner in realtime (wall-time). We also show that the smallest, cheapest (64-bit) instance on the studied environment is the best for price to performance ration. In light of the high-contention we witness, we believe that alternative definitions of efficiency for commercial cloud environments should be introduced where strong performance guarantees do not exist. Concepts like average, expected performance and execution time, expected cost to completion, and variance measures–-traditionally ignored in the high-performance computing context–-now should complement or even substitute the standard definitions of efficiency.

[1]  Yan Solihin,et al.  QoS policies and architecture for cache/memory in CMP platforms , 2007, SIGMETRICS '07.

[2]  Edward Walker,et al.  Benchmarking Amazon EC2 for High-Performance Scientific Computing , 2008, login Usenix Mag..

[3]  Alexander Stage,et al.  An economic decision model for business software application deployment on hybrid Cloud environments , 2010, MKWI.

[4]  Robert A. van de Geijn,et al.  Scalability Issues Affecting the Design of a Dense Linear Algebra Library , 1994, J. Parallel Distributed Comput..

[5]  Hans Werner Meuer,et al.  Top500 Supercomputer Sites , 1997 .

[6]  T. S. Eugene Ng,et al.  The Impact of Virtualization on Network Performance of Amazon EC2 Data Center , 2010, 2010 Proceedings IEEE INFOCOM.

[7]  Scott Pakin,et al.  The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8, 192 Processors of ASCI Q , 2003, SC.

[8]  Richard Wolski,et al.  The Eucalyptus Open-Source Cloud-Computing System , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[9]  Hong Ong,et al.  An Analysis of HPC Benchmarks in Virtual Machine Environments , 2009, Euro-Par Workshops.

[10]  Jerome Lauret,et al.  Virtual workspaces for scientific applications. , 2007 .

[11]  Richard Wolski,et al.  The impact of paravirtualized memory hierarchy on linear algebra computational kernels and software , 2008, HPDC '08.

[12]  Paolo Bientinesi,et al.  Can cloud computing reach the top500? , 2009, UCHPC-MAW '09.