Scheduling Mixed Real-Time and Non-real-Time Applications in MapReduce Environment

MapReduce scheduling is becoming a hot topic as MapReduce attracts more and more attention from both industry and academia. In this paper, we focus on the scheduling of mixed real-time and non-real-time applications in MapReduce environment, which is a challenging problem but receives only limited attention. To solve this problem, we present a two-level MapReduce scheduler built on previous techniques and make two key contributions. First, to meet the performance goal of real-time applications, we propose a deadline scheduler which adopts (1) a sampling based approach-Tasks Forward Scheduling (TFS) to predict map/reduce task execution time(unlike prior work that requires users to input an estimated value). (2) a resource allocation model-Approximately Uniform Minimum Degree of parallelism (AUMD) to dynamically control each realtime job to execute with minimum tasks assignment in any time so as to maximize the number of concurrent real-time jobs. Second, through integrating this deadline scheduler into existing MapReduce scheduler, we develop a two-level scheduler with resource preemption supported, and it could schedule mixed real-time and non-real-time jobs according to their respective performance demands. We implement our scheduler in Hadoop system and experiments running on a real, small-scale cluster demonstrate that it could schedule mixed real-time and nonreal-time jobs to meet their different quality-of-service (QoS) demands.

[1]  Matei Zaharia,et al.  Job Scheduling for Multi-User MapReduce Clusters , 2009 .

[2]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[3]  Magdalena Balazinska,et al.  ParaTimer: a progress indicator for MapReduce DAGs , 2010, SIGMOD Conference.

[4]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5]  Malgorzata Steinder,et al.  Performance-driven task co-scheduling for MapReduce environments , 2010, 2010 IEEE Network Operations and Management Symposium - NOMS 2010.

[6]  Insup Lee,et al.  Real-Time MapReduce Scheduling , 2010 .

[7]  Chao Tian,et al.  A Dynamic MapReduce Scheduler for Heterogeneous Workloads , 2009, 2009 Eighth International Conference on Grid and Cooperative Computing.

[8]  Aloysius K. Mok,et al.  Multiprocessor On-Line Scheduling of Hard-Real-Time Tasks , 1989, IEEE Trans. Software Eng..

[9]  Gregor von Laszewski,et al.  QoS guided Min-Min heuristic for grid task scheduling , 2003, Journal of Computer Science and Technology.

[10]  Thomas Sandholm,et al.  MapReduce optimization using regulated dynamic prioritization , 2009, SIGMETRICS '09.

[11]  Jitender S. Deogun,et al.  Real-Time Divisible Load Scheduling for Cluster Computing , 2007, 13th IEEE Real Time and Embedded Technology and Applications Symposium (RTAS'07).

[12]  Kemafor Anyanwu,et al.  Scheduling Hadoop Jobs to Meet Deadlines , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.