Makespan Minimization for Batch Tasks in Data Centers

Makespan is one for crucial factors to determine the performance of job scheduling in cloud data center, short makespan could lead to more job throughput and less energy consumption. In this paper, we study the joint task and data assignment problem to realized makespan minimization. We propose the data migration method to overcome the memory space limitation of servers, and realize better data locality for task execution. We conduct extensive simulations, and the simulation results show that our algorithm has significant improvement on makespan reduction.

[1]  Xiaoqiao Meng,et al.  Coupling task progress for MapReduce resource-aware scheduling , 2013, 2013 Proceedings IEEE INFOCOM.

[2]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[3]  Albert G. Greenberg,et al.  Scarlett: coping with skewed content popularity in mapreduce clusters , 2011, EuroSys '11.

[4]  Siva Theja Maguluri,et al.  Scheduling jobs with unknown duration in clouds , 2013, INFOCOM 2013.

[5]  Srikanth Kandula,et al.  Multi-resource packing for cluster schedulers , 2015, SIGCOMM.

[6]  Ishai Menache,et al.  Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can , 2015, SIGCOMM.

[7]  Lei Ying,et al.  Map task scheduling in MapReduce with data locality: Throughput and heavy-traffic optimality , 2013, INFOCOM.

[8]  Kun-Lung Wu,et al.  FLEX: A Slot Allocation Scheduling Optimizer for MapReduce Workloads , 2010, Middleware.

[9]  Roy H. Campbell,et al.  Two Sides of a Coin: Optimizing the Schedule of MapReduce Jobs to Minimize Their Makespan and Improve Cluster Performance , 2012, 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[10]  Anand Raghunathan,et al.  ShuffleWatcher: Shuffle-aware Scheduling in Multi-tenant MapReduce Clusters , 2014, USENIX Annual Technical Conference.

[11]  Yuanyuan Tian,et al.  CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop , 2011, Proc. VLDB Endow..

[12]  Joseph Y.-T. Leung,et al.  Handbook of Scheduling: Algorithms, Models, and Performance Analysis , 2004 .

[13]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[14]  Srikanth Kandula,et al.  PACMan: Coordinated Memory Caching for Parallel Jobs , 2012, NSDI.