The Impact of Virtualization on High Performance Computing Clustering in the Cloud

The ongoing pervasiveness of Internet access is intensively increasing Big Data production. This, in turn, increases demand on compute power to process this massive data, and thus rendering High Performance Computing HPC into a high solicited service. Based on the paradigm of providing computing as a utility, the Cloud is offering user-friendly infrastructures for processing Big Data, e.g., High Performance Computing as a Service HPCaaS. Still, HPCaaS performance is tightly coupled with the underlying virtualization technique since the latter is responsible for the creation of virtual machines that carry out data processing jobs. In this paper, the authors evaluate the impact of virtualization on HPCaaS. They track HPC performance under different Cloud virtualization platforms, namely KVM and VMware-ESXi, and compare it against physical clusters. Each tested cluster provided different performance trends. Yet, the overall analysis of the findings proved that the selection of virtualization technology can lead to significant improvements when handling HPCaaS.

[1]  Taneli Korri Cloud computing : utility computing over the Internet , 2009 .

[2]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[3]  Muli Ben-Yehuda,et al.  Quantitative Comparison of Xen and KVM , 2008 .

[4]  P. Mell,et al.  The NIST Definition of Cloud Computing , 2011 .

[5]  Naveed Yaqub,et al.  Comparison of Virtualization Performance: VMWare and KVM , 2012 .

[6]  John Gantz,et al.  The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East , 2012 .

[7]  Geoffrey C. Fox,et al.  High Performance Parallel Computing with Clouds and Cloud Technologies , 2009, CloudComp.

[8]  Timothy Wood,et al.  A component-based performance comparison of four hypervisors , 2013, 2013 IFIP/IEEE International Symposium on Integrated Network Management (IM 2013).

[9]  Umakishore Ramachandran,et al.  Towards realizing scalable high performance parallel systems , 1994 .

[10]  Karen A. Scarfone,et al.  Guide to Security for Full Virtualization Technologies , 2011 .

[11]  Sasiniveda.G,et al.  Data Analysis using Mapper and Reducer with Optimal Configuration in Hadoop , 2013 .

[12]  Widyawan Widyawan,et al.  Scalability Analysis of KVM-Based Private Cloud For Iaas , 2013, CloudCom 2013.

[13]  Robert L. Grossman,et al.  Sector and Sphere: the design and implementation of a high-performance data cloud , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[14]  Matthew Portnoy,et al.  Virtualization Essentials , 2012 .

[15]  Dimosthenis Kyriazis,et al.  Open-Source Iaas Fit For Purpose: A Comparison Between Opennebula and Openstack , 2013, Int. J. Electron. Bus. Manag..

[16]  Ahmed E. Youssef Exploring Cloud Computing Services and Applications , 2012 .

[17]  T. Chiueh,et al.  A Survey on Virtualization Technologies , 2005 .

[18]  Reinhold Kröger,et al.  State of the art in autonomic computing and virtualization , 2007 .

[19]  Herodotos Herodotou Hadoop Performance Models , 2011, ArXiv.

[20]  Geoffrey C. Fox,et al.  Analysis of Virtualization Technologies for High Performance Computing Environments , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[21]  Shiwei Yu ACID Properties in Distributed Databases , .

[22]  Michael Grüninger,et al.  Introduction , 2002, CACM.

[23]  Diogo M. F. Mattos,et al.  Evaluating Xen , VMware , and OpenVZ Virtualization Platforms for Network Virtualization , 2010 .

[24]  Lv Aili,et al.  Research of High Performance Computing With Clouds , 2010 .

[25]  Michael Isard,et al.  DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language , 2008, OSDI.

[26]  Garth A. Gibson,et al.  HPC Computation on Hadoop Storage with PLFS , 2012 .

[27]  Naveed Alam Survey On Hypervisors , 2011 .

[28]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[29]  M. Kakhani,et al.  Research Issues in Big Data Analytics , 2013 .

[30]  Carlos Maltzahn,et al.  Mixing Hadoop and HPC workloads on parallel filesystems , 2009, PDSW '09.

[31]  Nancy A. Lynch,et al.  Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , 2002, SIGA.

[32]  John Shalf,et al.  Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[33]  Constantinos Evangelinos,et al.  Cloud Computing for parallel Scientific HPC Applications: Feasibility of Running Coupled Atmosphere- , 2008 .

[34]  Charles David Graziano A performance analysis of Xen and KVM hypervisors for hosting the Xen Worlds Project , 2011 .

[35]  Naga Venkata Sudhakar Kolluru Sudhakar Enterprise governance model for hybrid cloud: IT Professional Conference @ National Institute of Standards and Technology (NIST), Gaithersburg, MD, USA , 2014 .

[36]  Rajkumar Buyya,et al.  Cluster Computing: High-Performance, High-Availability, and High-Throughput Processing on a Network of Computers , 2006, Handbook of Nature-Inspired and Innovative Computing.

[37]  Lavanya Ramakrishnan,et al.  Performance evaluation of a MongoDB and hadoop platform for scientific data analysis , 2013, Science Cloud '13.

[38]  S. Krishnan myHadoop-Hadoop-on-Demand on Traditional HPC Resources , 2004 .

[39]  Robert Rose Survey of System Virtualization Techniques , 2004 .

[40]  Shujia Zhou,et al.  Case study for running HPC applications in public clouds , 2010, HPDC '10.

[41]  Edward Walker,et al.  Benchmarking Amazon EC2 for High-Performance Scientific Computing , 2008, login Usenix Mag..

[42]  ともやん KVM (Kernel-based Virtual Machine) - 仮想化 , 2009 .