Distributed task scheduling with security and outage constraints in MapReduce

The emergence of MapReduce, a simple software framework, is helping to deal with vast amount of data (multiterabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner. Extensive researches and popularity are gained by MapReduce recently. In this paper, we consider the MapReduce task scheduling problem with security and outage constraints, which are performance effected and not well resolved. The objective is to minimize the makespan while meet data locality and security requirement. A heuristic algorithm with three components is proposed for the problem under study. The simulated results verified the effectiveness of the proposed method, which is closely dependent on the outage probability and the number of worker nodes.

[1]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[2]  Xiaohui Wei,et al.  MapReduce delay scheduling with deadline constraint , 2014, Concurr. Comput. Pract. Exp..

[3]  Xiaorong Li,et al.  SABA: A security-aware and budget-aware workflow scheduling strategy in clouds , 2015, J. Parallel Distributed Comput..

[4]  Xuehai Zhou,et al.  HPSO: Prefetching Based Scheduling to Improve Data Locality for MapReduce Clusters , 2014, ICA3PP.

[5]  Sandeep K. Sood,et al.  A combined approach to ensure data security in cloud computing , 2012, J. Netw. Comput. Appl..

[6]  Jorge-Arnulfo Quiané-Ruiz,et al.  RAFTing MapReduce: Fast recovery on the RAFT , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[7]  Wei Yu,et al.  A cloud computing based architecture for cyber security situation awareness , 2013, 2013 IEEE Conference on Communications and Network Security (CNS).

[8]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[9]  Chi-Yi Lin,et al.  On Improving Fault Tolerance for Heterogeneous Hadoop MapReduce Clusters , 2013, 2013 International Conference on Cloud Computing and Big Data.

[10]  Mohammad Hammoud,et al.  Locality-Aware Reduce Task Scheduling for MapReduce , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.

[11]  Thilo Kielmann,et al.  Bag-of-Tasks Scheduling under Budget Constraints , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[12]  Roy H. Campbell,et al.  ARIA: automatic resource inference and allocation for mapreduce environments , 2011, ICAC '11.

[13]  Xiaoping Li,et al.  Heuristics for periodical batch job scheduling in a MapReduce computing framework , 2016, Inf. Sci..

[14]  XiaoFeng Wang,et al.  Sedic: privacy-aware data intensive computing on hybrid clouds , 2011, CCS '11.

[15]  Jian Li,et al.  Cost-Conscious Scheduling for Large Graph Processing in the Cloud , 2011, 2011 IEEE International Conference on High Performance Computing and Communications.

[16]  Kenli Li,et al.  A MapReduce task scheduling algorithm for deadline constraints , 2013, Cluster Computing.

[17]  Rajeev Gandhi,et al.  An Analysis of Traces from a Production MapReduce Cluster , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.