An Optimized Speculative Execution Strategy Based on Local Data Prediction in a Heterogeneous Hadoop Environment

Hadoop is a famous distributed computing framework that is applied to process large-scale data. "Straggling tasks" have a serious impact on Hadoop performance due to imbalance of slow tasks distribution. Speculative execution (SE) presents a way to deal with Straggling tasks by monitoring the real-time progress of running tasks and replicating potential "Stragglers" on another node to increase the opportunity of completing backup tasks ahead of original. Current proposed SE strategies meet their challenges such as misjudgment of "Straggling tasks", improper selection of backup nodes, etc., which result in inefficient performance of the SE and its Hadoop system. In this paper, we propose an optimized SE strategy based on local data prediction, which collects task execution information in real time and uses Locally Weighted Regression (LWR) to predict remaining time of each running tasks, and selects an appropriate backup task node according to the actual requirements. It also combines a cost-benefit model to maximize the effectiveness of SE. According to the results, the proposed SE strategy implemented in Hadoop-2.6.0 enhances the accuracy of selecting potential Straggler task candidates, and shows better performance in various situations in a heterogeneous Hadoop environment.

[1]  Kenli Li,et al.  A Heuristic Speculative Execution Strategy in Heterogeneous Distributed Environments , 2014, 2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming.

[2]  Baogang Wei,et al.  Improving MapReduce Performance with Partial Speculative Execution , 2015, Journal of Grid Computing.

[3]  Atul Negi,et al.  A review of adaptive approaches to MapReduce scheduling in heterogeneous environments , 2014, 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[4]  Riyaz Jamadar,et al.  Dynamic Slot Allocation Optimization Framework for MapReduce Clusters , 2016 .

[5]  Bu-Sung Lee,et al.  DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters , 2014, IEEE Transactions on Cloud Computing.

[6]  Zhen Xiao,et al.  Improving MapReduce Performance Using Smart Speculative Execution Strategy , 2014, IEEE Transactions on Computers.

[7]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[8]  Xin Huang,et al.  Novel heuristic speculative execution strategies in heterogeneous distributed environments , 2016, Comput. Electr. Eng..

[9]  Kwang Mong Sim,et al.  A comparative review of job scheduling for MapReduce , 2011, 2011 IEEE International Conference on Cloud Computing and Intelligence Systems.

[10]  Wing Cheong Lau,et al.  Speculative Execution for a Single Job in a MapReduce-Like System , 2014, 2014 IEEE 7th International Conference on Cloud Computing.

[11]  Bohan Li,et al.  A New Speculative Execution Algorithm Based on C4.5 Decision Tree for Hadoop , 2015, ICYCSEE.

[12]  Chris Rose,et al.  A Break in the Clouds: Towards a Cloud Definition , 2011 .

[13]  Haiying Shen,et al.  An Exploration of Designing a Hybrid Scale-Up/Out Hadoop Architecture Based on Performance Measurements , 2017, IEEE Transactions on Parallel and Distributed Systems.