Implementation and Evaluation of the JobTracker Initiative Task Scheduling on Hadoop

MapReduce is one of the major successful framework to process large-scale data efficiently. Distributed programs can be implemented easily by describing only two methods, Map and Reduce. In Hadoop which is an open source implementation of MapReduce, a JobTracker (master program in Hadoop) assigns Map Tasks and Reduce Tasks to TaskTrackers (slave programs which execute the tasks). In an environment that multiple Hadoops are running on a physical machine, its computational resources should be shared by every Hadoop (Multi-Hadoop environment). In this environment, available computational resources of each Hadoop fluctuate dynamically by behaviors of other Hadoops. Therefore, the JobTracker needs to decide assignment of tasks based on loads and available computation resources on the cluster (JobTracker Initiative Task Scheduler). In this paper, we propose a method which decides the number of task executions in order to use computational resources efficiently based on a load on each computer. And we evaluate its performance, and our results show that the proposal method has achieved a reduction of execution times of jobs by about 11.1% in Multi-Hadoop environment as compared to original Hadoop.

[1]  Shoichi Saito,et al.  Dynamic Processing Slots Scheduling for I/O Intensive Jobs of Hadoop MapReduce , 2012, 2012 Third International Conference on Networking and Computing.

[2]  GhemawatSanjay,et al.  The Google file system , 2003 .

[3]  Limin Xiao,et al.  A Load-Driven Task Scheduler with Adaptive DSC for MapReduce , 2011, 2011 IEEE/ACM International Conference on Green Computing and Communications.

[4]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.