Algorithms for scheduling deadline-sensitive malleable tasks

Due to the ubiquity of batch data processing in cloud computing, the fundamental problem of scheduling malleable batch tasks and its extensions have received significant attention recently. In this paper, we consider an important model in which a set of n tasks is to be scheduled on C identical machines and each task is specified by a value, a workload, a deadline and a parallelism bound. Within the parallelism bound, the number of machines allocated to a task can vary over time without affecting its workload. For this model, we obtain two core results: a quantitative characterization of a sufficient and necessary condition such that a set of malleable batch tasks with deadlines can be scheduled on C machines, and a polynomial-time algorithm to produce such a feasible schedule. These core results provide a conceptual tool and an optimal scheduling algorithm that enable proposing new analyses and designs of algorithms and improving existing algorithms for extensive scheduling objectives.

[1]  Joseph Naor,et al.  Deadline-aware scheduling of big-data processing jobs , 2014, SPAA.

[2]  Joel L. Wolf,et al.  The X-flex cross-platform scheduler: who's the fairest of them all? , 2014, Industry papers.

[3]  David P. Williamson,et al.  The Design of Approximation Algorithms , 2011 .

[4]  Komal Shringare,et al.  Apache Hadoop Goes Realtime at Facebook , 2015 .

[5]  Azer Bestavros,et al.  CloudPack - Exploiting Workload Flexibility through Rational Pricing , 2012, Middleware.

[6]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[7]  Andrey Balmin,et al.  FlowFlex: Malleable Scheduling for Flows of MapReduce Jobs , 2013, Middleware.

[8]  Guy Even Recursive Greedy Methods , 2007, Handbook of Approximation Algorithms and Metaheuristics.

[9]  E. L. Lawler,et al.  A dynamic programming algorithm for preemptive scheduling of a single machine to minimize the number of late jobs , 1991 .

[10]  Srikanth Kandula,et al.  Jockey: guaranteed job latency in data parallel clusters , 2012, EuroSys '12.

[11]  Yonggang Wen,et al.  Toward Scalable Systems for Big Data Analytics: A Technology Tutorial , 2014, IEEE Access.

[12]  Joseph Naor,et al.  Near-optimal scheduling mechanisms for deadline-sensitive jobs in large computing clusters , 2012, SPAA '12.

[13]  Marco Spuri,et al.  Deadline Scheduling for Real-Time Systems , 2011 .

[14]  David R. Karger,et al.  Scheduling Algorithms , 2004, Algorithms and Theory of Computation Handbook.

[15]  Ishai Menache,et al.  Efficient online scheduling for deadline-sensitive jobs: extended abstract , 2013, SPAA.

[16]  Joseph Naor,et al.  Efficient online scheduling for deadline-sensitive jobs: extended abstract , 2013, SPAA.

[17]  Ohad Shamir,et al.  On-demand, Spot, or Both: Dynamic Resource Allocation for Executing Batch Jobs in the Cloud , 2014, ICAC.

[18]  Joseph Naor,et al.  A Truthful Mechanism for Value-Based Scheduling in Cloud Computing , 2011, SAGT.

[19]  Marco Spuri,et al.  Deadline Scheduling for Real-Time Systems: Edf and Related Algorithms , 2013 .

[20]  Yossi Azar,et al.  Truthful Online Scheduling with Commitments , 2015, EC.

[21]  Antony I. T. Rowstron,et al.  Bridging the tenant-provider gap in cloud services , 2012, SoCC '12.

[22]  Gilles Brassard,et al.  Fundamentals of Algorithmics , 1995 .