On Improvement of Cloud Virtual Machine Availability with Virtualization Fault Tolerance Mechanism

Virtualization is a common strategy to improve the existing computing resources, particularly in cloud computing field. Hadoop, one of Apache projects, is designed to scale up from single servers to thousands of machines, and each offer local computation and storage. However, how to guarantee stability and reliability have become great study topics. In this article, we use current open-source based on software and platform to reach our goal. For instance, Xen-Hyper visor virtualization technology, Open Nebula virtual machines management tool, and so on. After extending component capabilities, we developed a mechanism to support our idea and reached Hadoop High Availability which called Virtualization Fault Tolerance (VFT). We consider a practical problem that occurs frequently in our system, and the results in this paper also confirm the downtime time can be shortened if failure occurred. In this case, it is not only for the Hadoop applications, but also extended to more areas of cluster-based systems.

[1]  Dejan S. Milojicic,et al.  OpenNebula: A Cloud Management Tool , 2011, IEEE Internet Computing.

[2]  Robert L. Grossman,et al.  Compute and storage clouds using wide area high performance networks , 2008, Future Gener. Comput. Syst..

[3]  Ira Pramanick,et al.  High Availability , 2001, Int. J. High Perform. Comput. Appl..

[4]  Christian Engelmann,et al.  Active/active replication for highly available HPC system services , 2006, First International Conference on Availability, Reliability and Security (ARES'06).

[5]  Douglas Thain,et al.  A Comparison and Critique of Eucalyptus, OpenNebula and Nimbus , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[6]  Chen-Khong Tham,et al.  Analysis and optimization of service availability in a HA cluster with load-dependent machine availability , 2007, IEEE Transactions on Parallel and Distributed Systems.

[7]  Andrew Warfield,et al.  Xen and the art of virtualization , 2003, SOSP '03.

[8]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[9]  Floyd Piedad,et al.  High Availability: Design, Techniques and Processes , 2000 .

[10]  Xubin He,et al.  Implementing WebGIS on Hadoop: A case study of improving small file I/O performance on HDFS , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[11]  Xubin He,et al.  Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations , 2006, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[12]  John Paul Walters,et al.  A Comparison of Virtualization Technologies for HPC , 2008, 22nd International Conference on Advanced Information Networking and Applications (aina 2008).

[13]  Dave Turner,et al.  Protocol-dependent message-passing performance on Linux clusters , 2002, Proceedings. IEEE International Conference on Cluster Computing.

[14]  Christian Engelmann,et al.  A Framework for Proactive Fault Tolerance , 2008, 2008 Third International Conference on Availability, Reliability and Security.

[15]  Rubén S. Montero,et al.  An elasticity model for High Throughput Computing clusters , 2011, J. Parallel Distributed Comput..

[16]  Richard Wolski,et al.  The Eucalyptus Open-Source Cloud-Computing System , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[17]  Borja Sotomayor,et al.  Virtual Infrastructure Management in Private and Hybrid Clouds , 2009, IEEE Internet Computing.

[18]  Gang Wu,et al.  Design and Implementation of High Availability Distributed System Based on Multi-level Heartbeat Protocol , 2009, 2009 IITA International Conference on Control, Automation and Systems Engineering (case 2009).

[19]  Rafael Moreno-Vozmediano,et al.  Elastic management of cluster-based services in the cloud , 2009, ACDC '09.

[20]  Jun Wang,et al.  Improving metadata management for small files in HDFS , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[21]  Christian Engelmann,et al.  Proactive fault tolerance for HPC with Xen virtualization , 2007, ICS '07.

[22]  William von Hagen Professional XEN Virtualization , 2008 .

[23]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[24]  John Paul Walters,et al.  A fault-tolerant strategy for virtualized HPC clusters , 2009, The Journal of Supercomputing.

[25]  Ning Cao,et al.  Improving downloading performance in hadoop distributed file system: Improving downloading performance in hadoop distributed file system , 2010 .

[26]  Xuejie Zhang,et al.  An Approach to Optimized Resource Scheduling Algorithm for Open-Source Cloud Systems , 2010, 2010 Fifth Annual ChinaGrid Conference.

[27]  Alan L. Cox,et al.  The Hadoop distributed filesystem: Balancing portability and performance , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[28]  Chao-Tung Yang,et al.  A Virtualized HPC Cluster Computing Environment on Xen with Web-Based User Interface , 2009, HPCA.

[29]  Quan Chen,et al.  SAMR: A Self-adaptive MapReduce Scheduling Algorithm in Heterogeneous Environment , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[30]  Zhang Qi-xun Improving downloading performance in hadoop distributed file system , 2010 .