Breaking HPC Barriers with the 56GbE Cloud

Abstract With the widespread adoption of cloud computing, high-performance computing (HPC) is no longer limited to organisations with the funds and manpower necessary to house and run a supercomputer. However, the performance of large-scale scientific applications in the cloud has in the past been constrained by latency and bandwidth. The main reasons for these constraints are the design decisions of cloud providers, primarily focusing on high-density applications such as web services and data hosting. In this paper, we provide an overview of a high performance OpenStack cloud implementation at the National Computational Infrastructure (NCI). This cloud is targeted at high-performance scientific applications, and enables scientists to build their own clusters when their demands and software stacks conflict with traditional bare-metal HPC environments. In this paper, we present the architecture of our 56 GbE cloud and a preliminary set of HPC benchmark results against the more traditional cloud and native InfiniBand HPC environments. Three different network interconnects and configurations were tested as part of the Cloud deployment. These were 10G Ethernet, 56G Fat-tree Ethernet and native FDR Full Fat-tree InfiniBand (IB). In this paper, these three solutions are discussed from the viewpoint of on-demand HPC clusters focusing on bandwidth, latency and security. A detailed analysis of these metrics in the context of micro-benchmarks and scientific applications is presented, including the affects of using TCP and RDMA on scientific applications.

[1]  Kai Ren,et al.  IndexFS: Scaling File System Metadata Performance with Stateless Caching and Bulk Insertion , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[2]  Lavanya Ramakrishnan,et al.  Magellan: experiences from a science cloud , 2011, ScienceCloud '11.

[3]  Muhammad Atif,et al.  Scientific Application Performance on HPC, Private and Public Cloud Resources: A Case Study Using Climate, Cardiac Model Codes and the NPB Benchmark Suite , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[4]  Hrachya Astsatryan,et al.  NAMD Package Benchmarking on the Base of Armenian Grid Infrastructure , 2012 .

[5]  Shujia Zhou,et al.  Case study for running HPC applications in public clouds , 2010, HPDC '10.

[6]  Marty Humphrey,et al.  Assessing the Value of Cloudbursting: A Case Study of Satellite Image Processing on Windows Azure , 2011, 2011 IEEE Seventh International Conference on eScience.

[7]  T. Darden,et al.  Particle mesh Ewald: An N⋅log(N) method for Ewald sums in large systems , 1993 .

[8]  Lavanya Ramakrishnan,et al.  Seeking supernovae in the clouds: a performance study , 2010, HPDC '10.

[9]  Robert A. Scholtz,et al.  Performance Analysis of , 1998 .

[10]  Jianwu Wang,et al.  Kepler + Hadoop: a general architecture facilitating data-intensive applications in scientific workflow systems , 2009, WORKS '09.

[11]  David Skinner Performance monitoring of parallel scientific applications , 2005 .

[12]  Carl Staelin,et al.  lmbench: Portable Tools for Performance Analysis , 1996, USENIX Annual Technical Conference.

[13]  Laxmikant V. Kalé,et al.  Scalable molecular dynamics with NAMD , 2005, J. Comput. Chem..

[14]  Laxmikant V. Kalé,et al.  Scalable molecular dynamics with NAMD on the IBM Blue Gene/L system , 2008, IBM J. Res. Dev..

[15]  John Shalf,et al.  Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[16]  John Shalf,et al.  Defining future platform requirements for e-Science clouds , 2010, SoCC '10.

[17]  Laxmikant V. Kalé,et al.  CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.

[18]  Alexandru Iosup,et al.  A Performance Analysis of EC2 Cloud Computing Services for Scientific Computing , 2009, CloudComp.

[19]  Alexander G. Fletcher,et al.  Chaste: A test-driven approach to software development for biological modelling , 2009, Comput. Phys. Commun..

[20]  Jack Dongarra,et al.  Introduction to the HPCChallenge Benchmark Suite , 2004 .

[21]  Lin Yang,et al.  Investigating the use of autonomic cloudbursts for high-throughput medical image registration , 2009, 2009 10th IEEE/ACM International Conference on Grid Computing.

[22]  Michael C. Schatz,et al.  Cloud Computing and the DNA Data Race , 2010, Nature Biotechnology.

[23]  A. Staniforth,et al.  A new dynamical core for the Met Office's global and regional modelling of the atmosphere , 2005 .