A JobTracker Fault-tolerant Mechanism for Reducing Job Completion Time in the MapReduce Framework

In order to effectively provide cloud computing, IT infrastructure which supports distributed file system and parallel data processing is essential. To this end, MapReduce framework has been widely used for distributed processing of large-scale data. MapReduce framework has been proven as an efficient way to construct distributed and parallel processing system at relatively low cost. However, it has the problem of single point of failure (SPOF) at JobTracker that is responsible for scheduling and assigning of all MapReduce tasks. When JobTracker has failed, the completion time of the MapReduce job is increased because the entire MapReduce tasks must be restarted. To resolve the above mentioned problem we designed and implemented JobTracker fault-tolerant mechanism for MapReduce framework. The performance of the mechanism is evaluated by using MapReduce testbed and fault-injection method. As a result, the average job completion time of the mechanism is dramatically reduced about 46.5%~64.4% compared to the result of a naive MapReduce.