A Survey of Speculative Execution Strategy in MapReduce

MapReduce is a parallel computing programming model designed to process large-scale data. Therefore, the accuracy and efficiency for computing are needed to be assured and speculative execution is an efficient method for calculation of fault tolerance. It reaches the goals of shortening the execution time and increasing the cluster throughput through selecting slow tasks and speculative copy these tasks on a fast machine to be executed. Hadoop naive speculative execution strategy assumes that the cluster is homogeneous, and this assumption leads to the poor performance in heterogeneous environment. Several speculative execution strategies which aim to improve the MapReduce Performance in the heterogeneous environments are reviewed in this paper like LATE, MCP, ex-MCP and ERUL, then the comparison between these methods are listed.

[1]  Zhihua Xia,et al.  A Secure and Dynamic Multi-Keyword Ranked Search Scheme over Encrypted Cloud Data , 2016, IEEE Transactions on Parallel and Distributed Systems.

[2]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[3]  Atul Negi,et al.  A review of adaptive approaches to MapReduce scheduling in heterogeneous environments , 2014, 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[4]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[6]  Kwang Mong Sim,et al.  A comparative review of job scheduling for MapReduce , 2011, 2011 IEEE International Conference on Cloud Computing and Intelligence Systems.

[7]  Zhen Xiao,et al.  Improving MapReduce Performance Using Smart Speculative Execution Strategy , 2014, IEEE Transactions on Computers.

[8]  Xin Huang,et al.  Novel heuristic speculative execution strategies in heterogeneous distributed environments , 2016, Comput. Electr. Eng..

[9]  Xingming Sun,et al.  Achieving Efficient Cloud Search Services: Multi-Keyword Ranked Search over Encrypted Cloud Data Supporting Parallel Computing , 2015, IEICE Trans. Commun..

[10]  Gordon S. Blair,et al.  A generic component model for building systems software , 2008, TOCS.

[11]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[12]  Xingming Sun,et al.  Enabling Personalized Search over Encrypted Outsourced Data with Efficiency Improvement , 2016, IEEE Transactions on Parallel and Distributed Systems.

[13]  Kenli Li,et al.  A Heuristic Speculative Execution Strategy in Heterogeneous Distributed Environments , 2014, 2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming.

[14]  Vijayalakshmi Bhupathiraju,et al.  The dawn of Big Data - Hbase , 2014, 2014 Conference on IT in Business, Industry and Government (CSIBIG).

[15]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[16]  GhemawatSanjay,et al.  The Google file system , 2003 .