Improving Fair Scheduling Performance on Hadoop

Cloud computing is a potential technique to deal with big data. Apache Hadoop which provides the MapReduce parallel processing framework becomes a popular system for distributed storage and processing of large data sets on computer clusters. The performance of Hadoop in parallel data processing is relied on the efficiency of a MapReduce scheduling algorithm underlying. In this paper, we improve the performance of the well-known fair scheduling algorithm adopted in Hadoop by introducing several mechanisms. The modified scheduling algorithm can properly adapt to the runtime environment's condition with the objective of job fairness and short response time. Performance evaluations verify the superiority of the proposed algorithm over the original fair sharing algorithm.

[1]  Archana Ganapathi,et al.  The Case for Evaluating MapReduce Performance Using Workload Suites , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.

[2]  Dominique A. Heger,et al.  Optimized Resource Allocation & Task Scheduling Challenges in Cloud Computing Environments , 2011 .

[3]  Yun Tian,et al.  Improving MapReduce performance through data placement in heterogeneous Hadoop clusters , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[4]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[5]  Shoichi Saito,et al.  Dynamic Processing Slots Scheduling for I/O Intensive Jobs of Hadoop MapReduce , 2012, 2012 Third International Conference on Networking and Computing.

[6]  Alexandros Labrinidis,et al.  Challenges and Opportunities with Big Data , 2012, Proc. VLDB Endow..

[7]  L. S. S. Reddy,et al.  Survey on Improved Scheduling in Hadoop MapReduce in Cloud Environments , 2012, ArXiv.

[8]  NIDHI TIWARI,et al.  Classification Framework of MapReduce Scheduling Algorithms , 2015, ACM Comput. Surv..

[9]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[10]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[11]  Riaan Wolhuter,et al.  An Architectural Scheme for Real-Time Multiple Users Beam Tracking Systems , 2017, IEEE Systems Journal.

[12]  Xu Liu,et al.  Evaluating task scheduling in hadoop-based cloud systems , 2013, 2013 IEEE International Conference on Big Data.

[13]  Bu-Sung Lee,et al.  Dynamic slot allocation technique for MapReduce clusters , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[14]  Shoichi Saito,et al.  Implementation and Evaluation of the JobTracker Initiative Task Scheduling on Hadoop , 2013, 2013 First International Symposium on Computing and Networking.

[15]  S. Sowmya Kamath,et al.  Resource aware scheduling in Hadoop for heterogeneous workloads based on load estimation , 2013, 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT).

[16]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[17]  Hui Zhao,et al.  An locality-aware scheduling based on a novel scheduling model to improve system throughput of MapReduce cluster , 2012, Proceedings of 2012 2nd International Conference on Computer Science and Network Technology.

[18]  Xiaolong Xu,et al.  Adaptive Task Scheduling Strategy Based on Dynamic Workload Adjustment for Heterogeneous Hadoop Clusters , 2016, IEEE Systems Journal.