Task Scheduling in Big Data Platforms: A Systematic Literature Review

Context: Hadoop, Spark, Storm, and Mesos are very well known frameworks in both research and industrial communities that allow expressing and processing distributed computations on massive amounts of data. Multiple scheduling algorithms have been proposed to ensure that short interactive jobs, large batch jobs, and guaranteed-capacity production jobs running on these frameworks can deliver results quickly while maintaining a high throughput. However, only a few works have examined the effectiveness of these algorithms. Objective: The Evidence-based Software Engineering (EBSE) paradigm and its core tool, i.e., the Systematic Literature Review (SLR), have been introduced to the Software Engineering community in 2004 to help researchers systematically and objectively gather and aggregate research evidences about different topics. In this paper, we conduct a SLR of task scheduling algorithms that have been proposed for big data platforms. Method: We analyse the design decisions of different scheduling models proposed in the literature for Hadoop, Spark, Storm, and Mesos over the period between 2005 and 2016. We provide a research taxonomy for succinct classification of these scheduling models. We also compare the algorithms in terms of performance, resources utilization, and failure recovery mechanisms. Results: Our searches identifies 586 studies from journals, conferences and workshops having the highest quality in this field. This SLR reports about different types of scheduling models (dynamic, constrained, and adaptive) and the main motivations behind them (including data locality, workload balancing, resources utilization, and energy efficiency). A discussion of some open issues and future challenges pertaining to improving the current studies is provided.

[1]  Yi Yao,et al.  Self-Adjusting Slot Configurations for Homogeneous and Heterogeneous Hadoop Clusters , 2017, IEEE Transactions on Cloud Computing.

[2]  Cristina L. Abad,et al.  Natjam: design and evaluation of eviction policies for supporting priorities and deadlines in mapreduce clusters , 2013, SoCC.

[3]  Shoichi Saito,et al.  Dynamic Processing Slots Scheduling for I/O Intensive Jobs of Hadoop MapReduce , 2012, 2012 Third International Conference on Networking and Computing.

[4]  Bu-Sung Lee,et al.  Dynamic Job Ordering and Slot Configurations for MapReduce Workloads , 2016, IEEE Transactions on Services Computing.

[5]  Yanpei Chen,et al.  Energy efficiency for large-scale MapReduce workloads with significant interactive analysis , 2012, EuroSys '12.

[6]  T. S. Eugene Ng,et al.  Understanding the effects and implications of compute node related failures in hadoop , 2012, HPDC '12.

[7]  Albert G. Greenberg,et al.  Scarlett: coping with skewed content popularity in mapreduce clusters , 2011, EuroSys '11.

[8]  Irene Finocchi,et al.  On data skewness, stragglers, and MapReduce progress indicators , 2015, SoCC.

[9]  Valentin Cristea,et al.  Speculative Genetic Scheduling Method for Hadoop Environments , 2012, 2012 14th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing.

[10]  Chita R. Das,et al.  HybridMR: A Hierarchical MapReduce Scheduler for Hybrid Data Centers , 2013, 2013 IEEE 33rd International Conference on Distributed Computing Systems.

[11]  Matei Zaharia,et al.  Job Scheduling for Multi-User MapReduce Clusters , 2009 .

[12]  Changjun Jiang,et al.  FlexSlot: Moving Hadoop Into the Cloud with Flexible Slot Management , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  Yon Dohn Chung,et al.  Parallel data processing with MapReduce: a survey , 2012, SGMD.

[14]  Pietro Michiardi,et al.  HFSP: Bringing Size-Based Scheduling To Hadoop , 2017, IEEE Transactions on Cloud Computing.

[15]  Bu-Sung Lee,et al.  DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters , 2014, IEEE Transactions on Cloud Computing.

[16]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[17]  Mohammad Hosseini,et al.  R-Storm: Resource-Aware Scheduling in Storm , 2015, Middleware.

[18]  Yeh-Ching Chung,et al.  DRASH: A Data Replication-Aware Scheduler in Geo-Distributed Data Centers , 2016, 2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom).

[19]  Namrata Singh,et al.  A review of research on MapReduce scheduling algorithms in Hadoop , 2015, International Conference on Computing, Communication & Automation.

[20]  Zhen Xiao,et al.  Improving MapReduce Performance Using Smart Speculative Execution Strategy , 2014, IEEE Transactions on Computers.

[21]  Jian Tang,et al.  T-Storm: Traffic-Aware Online Scheduling in Storm , 2014, 2014 IEEE 34th International Conference on Distributed Computing Systems.

[22]  Albert Y. Zomaya,et al.  Workload Characteristic Oriented Scheduler for MapReduce , 2012, 2012 IEEE 18th International Conference on Parallel and Distributed Systems.

[23]  Bu-Sung Lee,et al.  Dynamic slot allocation technique for MapReduce clusters , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[24]  Hee Yong Youn,et al.  Notice of Violation of IEEE Publication PrinciplesHadoop Preemptive Deadline Constraint Scheduler , 2014, 2014 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery.

[25]  Mayank Bansal,et al.  Astro: A predictive model for anomaly detection and feedback-based scheduling on Hadoop , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[26]  Chao Tian,et al.  A Dynamic MapReduce Scheduler for Heterogeneous Workloads , 2009, 2009 Eighth International Conference on Grid and Cooperative Computing.

[27]  Xian-He Sun,et al.  ADAPT: Availability-Aware MapReduce Data Placement for Non-dedicated Distributed Computing , 2012, 2012 IEEE 32nd International Conference on Distributed Computing Systems.

[28]  Andrew V. Goldberg,et al.  Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.

[29]  Yin Li,et al.  H-PFSP: Efficient Hybrid Parallel PFSP Protected Scheduling for MapReduce System , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[30]  Jin-Soo Kim,et al.  HPMR: Prefetching and pre-shuffling in shared MapReduce computation environment , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[31]  Gabriel Antoniu,et al.  Enabling fast failure recovery in shared Hadoop clusters: Towards failure-aware scheduling , 2017, Future Gener. Comput. Syst..

[32]  Zhao Li,et al.  Scheduling real-time workflow on MapReduce-based cloud , 2013, Third International Conference on Innovative Computing Technology (INTECH 2013).

[33]  Anastasios Gounaris,et al.  Optimal Tradeoff between Energy Consumption and Response Time in Large-Scale MapReduce Clusters , 2011, 2011 15th Panhellenic Conference on Informatics.

[34]  Jinli Wang,et al.  Research of Scheduling Strategy Based on Fault Tolerance in Hadoop Platform , 2013, GRMSE.

[35]  Indranil Gupta,et al.  WOHA: Deadline-Aware Map-Reduce Workflow Scheduling Framework over Hadoop Clusters , 2014, 2014 IEEE 34th International Conference on Distributed Computing Systems.

[36]  Bingsheng He,et al.  An Adaptive Efficiency-Fairness Meta-Scheduler for Data-Intensive Computing , 2019, IEEE Transactions on Services Computing.

[37]  Patrick Valduriez,et al.  FP-Hadoop: Efficient processing of skewed MapReduce jobs , 2016, Inf. Syst..

[38]  Sofiène Tahar,et al.  ATLAS: An AdapTive faiLure-Aware Scheduler for Hadoop , 2015, 2015 IEEE 34th International Performance Computing and Communications Conference (IPCCC).

[39]  Mingfa Zhu,et al.  MIMP: Deadline and Interference Aware Scheduling of Hadoop Virtual Machines , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[40]  Kenli Li,et al.  An optimized MapReduce workflow scheduling algorithm for heterogeneous computing , 2016, The Journal of Supercomputing.

[41]  Ruini Xue,et al.  BOLAS: Bipartite-Graph Oriented Locality-Aware Scheduling for MapReduce Tasks , 2015, 2015 14th International Symposium on Parallel and Distributed Computing.

[42]  Yuhong Feng,et al.  An effective data locality aware task scheduling method for MapReduce framework in heterogeneous environments , 2011, 2011 International Conference on Cloud and Service Computing.

[43]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[44]  A. Raj,et al.  Enhancement of Hadoop Clusters with Virtualization Using the Capacity Scheduler , 2012, 2012 Third International Conference on Services in Emerging Markets.

[45]  Hui Zhao,et al.  K%-Fair scheduling: A flexible task scheduling strategy for balancing fairness and efficiency in MapReduce systems , 2012, Proceedings of 2012 2nd International Conference on Computer Science and Network Technology.

[46]  Abhishek Joshi,et al.  Cascket: A binary protocol based c client-driver for Apache Cassandra , 2015, 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[47]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[48]  Tore Dybå,et al.  Empirical studies of agile software development: A systematic review , 2008, Inf. Softw. Technol..

[49]  Yi Yao,et al.  HaSTE: Hadoop YARN Scheduling Based on Task-Dependency and Resource-Demand , 2014, 2014 IEEE 7th International Conference on Cloud Computing.

[50]  Shanshan Li,et al.  SkewControl: Gini Out of the Bottle , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[51]  Bo Li,et al.  Symbiosis: Network-aware task scheduling in data-parallel frameworks , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[52]  Wei-Kuan Shih,et al.  LaSA: A locality-aware scheduling algorithm for Hadoop-MapReduce resource assignment , 2013, 2013 International Conference on Collaboration Technologies and Systems (CTS).

[53]  Douglas G. Down,et al.  A Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[54]  Xiaobo Zhou,et al.  iShuffle: Improving Hadoop Performance with Shuffle-on-Write , 2017, IEEE Transactions on Parallel and Distributed Systems.

[55]  Changjun Jiang,et al.  Moving Hadoop into the Cloud with Flexible Slot Management and Speculative Execution , 2017, IEEE Transactions on Parallel and Distributed Systems.

[56]  Cristina L. Abad,et al.  DARE: Adaptive Data Replication for Efficient Cluster Scheduling , 2011, 2011 IEEE International Conference on Cluster Computing.

[57]  Hao Zhu,et al.  Adaptive Failure Detection via Heartbeat under Hadoop , 2011, 2011 IEEE Asia-Pacific Services Computing Conference.

[58]  Xiao Qin,et al.  FiDoop-DP: Data Partitioning in Frequent Itemset Mining on Hadoop Clusters , 2017, IEEE Transactions on Parallel and Distributed Systems.

[59]  Subhajit Sidhanta,et al.  OptEx: A Deadline-Aware Cost Optimization Model for Spark , 2016, 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid).

[60]  Yean-Fu Wen Energy-aware dynamical hosts and tasks assignment for cloud computing , 2016, J. Syst. Softw..

[61]  Weisong Shi,et al.  Energy-Aware Scheduling of MapReduce Jobs for Big Data Applications , 2015, IEEE Transactions on Parallel and Distributed Systems.

[62]  Robert B. Ross,et al.  YARNsim: Simulating Hadoop YARN , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[63]  Raouf Boutaba,et al.  Dynamic Resource Allocation for MapReduce with Partitioning Skew , 2016, IEEE Transactions on Computers.

[64]  Yi Liang,et al.  Predoop: Preempting Reduce Task for Job Execution Accelerations , 2014, BPOE@ASPLOS/VLDB.

[65]  Luciana Arantes,et al.  MRA++: Scheduling and data placement on MapReduce for heterogeneous environments , 2015, Future Gener. Comput. Syst..

[66]  Douglas G. Down,et al.  An adaptive scheduling algorithm for dynamic heterogeneous Hadoop systems , 2011, CASCON.

[67]  Changjun Jiang,et al.  Towards Energy Efficiency in Heterogeneous Hadoop Clusters by Adaptive Task Assignment , 2015, 2015 IEEE 35th International Conference on Distributed Computing Systems.

[68]  Quanyuan Wu,et al.  Locality Based Data Partitioning in MapReduce , 2013, 2013 IEEE 16th International Conference on Computational Science and Engineering.

[69]  Masato Asahara,et al.  LoadAtomizer: A locality and I/O load aware task scheduler for MapReduce , 2012, 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings.

[70]  Ya-Shu Chen,et al.  Data-locality-aware mapreduce real-time scheduling framework , 2016, J. Syst. Softw..

[71]  Xu Zhao,et al.  A Parameter Dynamic-Tuning Scheduling Algorithm Based on History in Heterogeneous Environments , 2012, 2012 Seventh ChinaGrid Annual Conference.

[72]  Changjun Jiang,et al.  Resource and Deadline-Aware Job Scheduling in Dynamic Hadoop Clusters , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[73]  Quan Chen,et al.  SAMR: A Self-adaptive MapReduce Scheduling Algorithm in Heterogeneous Environment , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[74]  Lei Ying,et al.  A throughput optimal algorithm for map task scheduling in mapreduce with data locality , 2013, PERV.

[75]  Claes Wohlin,et al.  Guidelines for snowballing in systematic literature studies and a replication in software engineering , 2014, EASE '14.

[76]  Xiaoqiao Meng,et al.  Coupling task progress for MapReduce resource-aware scheduling , 2013, 2013 Proceedings IEEE INFOCOM.

[77]  L. S. S. Reddy,et al.  Survey on Improved Scheduling in Hadoop MapReduce in Cloud Environments , 2012, ArXiv.

[78]  Mark S. Squillante,et al.  Failure data analysis of a large-scale heterogeneous server environment , 2004, International Conference on Dependable Systems and Networks, 2004.

[79]  Patrick Wendell,et al.  Sparrow: distributed, low latency scheduling , 2013, SOSP.

[80]  Shyam Deshmukh,et al.  Survey on Task Assignment Techniques in Hadoop , 2012 .

[81]  Xin Yang,et al.  IncMR: Incremental Data Processing Based on MapReduce , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[82]  Yang Xiang,et al.  Hadoop Performance Modeling for Job Estimation and Resource Provisioning , 2016, IEEE Transactions on Parallel and Distributed Systems.

[83]  Mohammad Hammoud,et al.  Locality-Aware Reduce Task Scheduling for MapReduce , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.

[84]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[85]  Mohamed Faten Zhani,et al.  PRISM: Fine-Grained Resource-Aware Scheduling for MapReduce , 2015, IEEE Transactions on Cloud Computing.

[86]  Jorge-Arnulfo Quiané-Ruiz,et al.  RAFTing MapReduce: Fast recovery on the RAFT , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[87]  Yingchi Mao,et al.  A Fine-Grained and Dynamic MapReduce Task Scheduling Scheme for the Heterogeneous Cloud Environment , 2015, 2015 14th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES).

[88]  Phuong Nguyen,et al.  A Hybrid Scheduling Algorithm for Data Intensive Workloads in a MapReduce Environment , 2012, 2012 IEEE Fifth International Conference on Utility and Cloud Computing.

[89]  Xiaobo Zhou,et al.  Improving MapReduce performance in heterogeneous environments with adaptive task tuning , 2014, Middleware.

[90]  Gabriel Antoniu,et al.  Chronos: Failure-aware scheduling in shared Hadoop clusters , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[91]  Shengzhong Feng,et al.  Improving Data Locality of MapReduce by Scheduling in Homogeneous Computing Environments , 2011, 2011 IEEE Ninth International Symposium on Parallel and Distributed Processing with Applications.

[92]  Jordi Torres,et al.  Deadline-Based MapReduce Workload Management , 2013, IEEE Transactions on Network and Service Management.

[93]  Limin Xiao,et al.  A Load-Driven Task Scheduler with Adaptive DSC for MapReduce , 2011, 2011 IEEE/ACM International Conference on Green Computing and Communications.

[94]  Tianyu Wo,et al.  CREST: Towards Fast Speculation of Straggler Tasks in MapReduce , 2011, 2011 IEEE 8th International Conference on e-Business Engineering.

[95]  Qiang Liu,et al.  A Delay Scheduling Algorithm Based on History Time in Heterogeneous Environments , 2013, 2013 8th ChinaGrid Annual Conference.

[96]  Kun-Lung Wu,et al.  FLEX: A Slot Allocation Scheduling Optimizer for MapReduce Workloads , 2010, Middleware.

[97]  Liu Yang,et al.  New improvement of the Hadoop relevant data locality scheduling algorithm based on LATE , 2011, 2011 International Conference on Mechatronic Science, Electric Engineering and Computer (MEC).

[98]  Yun Tian,et al.  Improving MapReduce performance through data placement in heterogeneous Hadoop clusters , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[99]  Geoffrey C. Fox,et al.  Investigation of Data Locality in MapReduce , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[100]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[101]  Yi Yao,et al.  FRESH: Fair and Efficient Slot Configuration and Scheduling for Hadoop Clusters , 2014, 2014 IEEE 7th International Conference on Cloud Computing.

[102]  Hui Zhao,et al.  An locality-aware scheduling based on a novel scheduling model to improve system throughput of MapReduce cluster , 2012, Proceedings of 2012 2nd International Conference on Computer Science and Network Technology.

[103]  Shicong Meng,et al.  Improving ReduceTask data locality for sequential MapReduce jobs , 2013, 2013 Proceedings IEEE INFOCOM.

[104]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[105]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.