Fault Tolerance in Hadoop for Work Migration

2. Introduction Hadoop [1] is an open-source software framework implemented using Java and is designed to be used on large distributed systems. Hadoop is a project of the Apache Software Foundation and is a very popular software tool due, in part, to it being opensource. Yahoo! Has contributed to about 80% of the main core of Hadoop [3], but many other large technology organizations have used or are currently using Hadoop, such as, Facebook, Twitter, LinkedIn, and others [3]. The Hadoop framework is comprised of many different projects, but two of the main ones are the Hadoop Distributed File System (HDFS) and MapReduce. HDFS is designed to work with the MapReduce paradigm. This survey paper is focused around HDFS and how it was implemented to be very fault tolerant because fault tolerance is an essential part of modern day distributed systems.

[1]  Alan L. Cox,et al.  The Hadoop distributed filesystem: Balancing portability and performance , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[2]  Anand Sivasubramaniam,et al.  The Impact of Migration on Parallel Job Scheduling for Distributed Systems , 2000, Euro-Par.

[3]  Miguel Correia,et al.  Making Hadoop MapReduce Byzantine Fault-Tolerant , 2010, DSN 2010.

[4]  Takashi Nanya Fault-Tolerance Techniques in Distributed Systems , 1992 .

[5]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[6]  Bo Dong,et al.  Hadoop high availability through metadata replication , 2009, CloudDB@CIKM.