A New Speculative Execution Algorithm Based on C4.5 Decision Tree for Hadoop

As a distributed computing platform, Hadoop provides an effective way to handle big data. In Hadoop, the completion time of job will be delayed by a straggler. Although the definitive cause of the straggler is hard to detect, speculative execution is usually used for dealing with this problem, by simply backing up those stragglers on alternative nodes. In this paper, we design a new Speculative Execution algorithm based on C4.5 Decision Tree, SECDT, for Hadoop. In SECDT, we speculate completion time of stragglers and also of backup tasks, based on a kind of decision tree method: C4.5 decision tree. After we speculate the completion time, we compare the completion time of stragglers and of the backup tasks, calculating their differential value, and selecting the straggler with the maximum differential value to start the backup task. Experiment result shows that the SECDT can predict execution time more accurately than other speculative execution methods, hence reduce the job completion time.

[1]  Cheol-Ho Hong,et al.  Performance impact of JobTracker failure in Hadoop , 2015, Int. J. Commun. Syst..

[2]  Vivek Sarkar,et al.  Speculative Execution of Parallel Programs with Precise Exception Semantics on GPUs , 2013, LCPC.

[3]  I-Ching Hsu,et al.  Multilayer context cloud framework for mobile Web 2.0: a proposed infrastructure , 2013, Int. J. Commun. Syst..

[4]  Milind Bhandarkar Hadoop: a view from the trenches , 2013, KDD.

[5]  Chen He,et al.  ESAMR: An Enhanced Self-Adaptive MapReduce Scheduling Algorithm , 2012, 2012 IEEE 18th International Conference on Parallel and Distributed Systems.

[6]  Zhen Xiao,et al.  Improving MapReduce Performance Using Smart Speculative Execution Strategy , 2014, IEEE Transactions on Computers.

[7]  Quan Chen,et al.  SAMR: A Self-adaptive MapReduce Scheduling Algorithm in Heterogeneous Environment , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[8]  Bao Yi Wang,et al.  Study of an Improved Hadoop Speculative Execution Algorithm , 2014 .

[9]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[10]  Wing Cheong Lau,et al.  Optimization for Speculative Execution of Multiple Jobs in a MapReduce-like Cluster , 2014, ArXiv.

[11]  Quan Chen,et al.  HAT: history-based auto-tuning MapReduce in heterogeneous environments , 2013, The Journal of Supercomputing.

[12]  Wing Cheong Lau,et al.  Optimization for speculative execution in a MapReduce-like cluster , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).