Algorithms for Scheduling Malleable Tasks

Due to the ubiquity of batch data processing in cloud computing, the related problem of scheduling malleable batch tasks and its extensions have received significant attention recently. In this paper, we consider a fundamental model where a set of n tasks is to be processed on C identical machines and each task is specified by a value, a workload, a deadline and a parallelism bound. Within the parallelism bound, the number of machines assigned to a task can vary over time without affecting its workload. For this model, we obtain two core results: a sufficient and necessary condition such that a set of tasks can be finished by their deadlines on C machines, and an algorithm to produce such a schedule. These core results provide a conceptual tool and an optimal scheduling algorithm that enable proposing new algorithmic analysis and design and improving existing algorithms under various objectives.

[1]  Joseph Naor,et al.  Deadline-aware scheduling of big-data processing jobs , 2014, SPAA.

[2]  Andrey Balmin,et al.  FlowFlex: Malleable Scheduling for Flows of MapReduce Jobs , 2013, Middleware.

[3]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[4]  Joseph Naor,et al.  Near-optimal scheduling mechanisms for deadline-sensitive jobs in large computing clusters , 2012, SPAA '12.

[5]  Yossi Azar,et al.  Truthful Online Scheduling with Commitments , 2015, EC.

[6]  Xiaohu Wu,et al.  Algorithms for scheduling deadline-sensitive malleable tasks , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[7]  Gilles Brassard,et al.  Fundamentals of Algorithmics , 1995 .

[8]  Joel L. Wolf,et al.  The X-flex cross-platform scheduler: who's the fairest of them all? , 2014, Industry papers.

[9]  Marco Spuri,et al.  Deadline Scheduling for Real-Time Systems: Edf and Related Algorithms , 2013 .

[10]  E. L. Lawler,et al.  A dynamic programming algorithm for preemptive scheduling of a single machine to minimize the number of late jobs , 1991 .

[11]  Yonggang Wen,et al.  Toward Scalable Systems for Big Data Analytics: A Technology Tutorial , 2014, IEEE Access.

[12]  Srikanth Kandula,et al.  Jockey: guaranteed job latency in data parallel clusters , 2012, EuroSys '12.

[13]  C.J.H. Mann,et al.  Handbook of Approximation: Algorithms and Metaheuristics , 2008 .

[14]  Joseph Naor,et al.  A Truthful Mechanism for Value-Based Scheduling in Cloud Computing , 2013, Theory of Computing Systems.

[15]  D. Atkin OR scheduling algorithms. , 2000, Anesthesiology.

[16]  B. E. Eckbo,et al.  Appendix , 1826, Epilepsy Research.

[17]  Ishai Menache,et al.  Efficient online scheduling for deadline-sensitive jobs: extended abstract , 2013, SPAA.

[18]  Ohad Shamir,et al.  On-demand, Spot, or Both: Dynamic Resource Allocation for Executing Batch Jobs in the Cloud , 2014, ICAC.