Traditional HPC (High Performance Computing) clusters are best suited for well-formed calculations. The orderly batch-oriented HPC cluster offers maximal potential for performance per application, but limits resource efficiency and user flexibility. An HPC cloud can host multiple virtual HPC clusters, giving the scientists unprecedented flexibility for research and development. With the proper incentive model, resource efficiency will be automatically maximized. In this context, there are three new challenges. The first is the virtualization overheads. The second is the administrative complexity for scientists to manage the virtual clusters. The third is the programming model. The existing HPC programming models were designed for dedicated homogeneous parallel processors. The HPC cloud is typically heterogeneous and shared. This paper reports on the practice and experiences in building a private HPC cloud using a subset of a traditional HPC cluster. We report our evaluation criteria using Open Source software, and performance studies for compute-intensive and data-intensive applications. We also report the design and implementation of a Puppet-based virtual cluster administration tool called HPCFY. In addition, we show that even if the overhead of virtualization is present, efficient scalability for virtual clusters can be achieved by understanding the effects of virtualization overheads on various types of HPC and Big Data workloads. We aim at providing a detailed experience report to the HPC community, to ease the process of building a private HPC cloud using Open Source software.
[1]
Rich Seifert,et al.
Gigabit Ethernet: Technology and Applications for High-Speed LANs
,
1998
.
[2]
Paul Anderson.
Proceedings of the 2003 Large Installations Systems Administration (LISA) Conference
,
2003
.
[3]
Neil A. Ernst,et al.
The Journal of Systems and Software
,
2022
.
[4]
James Turnbull.
Pulling Strings with Puppet : Configuration Management Made Easy
,
2008
.
[5]
William Gropp,et al.
Beowulf Cluster Computing with Linux
,
2003
.
[6]
Daniel Gooch,et al.
Communications of the ACM
,
2011,
XRDS.
[7]
Sean Owen,et al.
Mahout in Action
,
2011
.
[8]
P ? ? ? ? ? ? ? % ? ? ? ?
,
1991
.
[9]
Sayantan Sur,et al.
Cluster File Systems
,
2011,
Encyclopedia of Parallel Computing.
[10]
Abdallah Khreishah,et al.
SpotMPI: A Framework for Auction-Based HPC Computing Using Amazon Spot Instances
,
2011,
ICA3PP.
[11]
Larry Peterson,et al.
Proceedings of the nineteenth ACM symposium on Operating systems principles
,
2003,
SOSP 2003.
[13]
Penny J Johnes,et al.
AGU Fall Meeting Abstracts
,
2013
.
[14]
P. Gregory,et al.
February
,
1890,
The Hospital.