Optimizing the stretch of independent tasks on a cluster: From sequential tasks to moldable tasks

This paper addresses the problem of scheduling non-preemptive moldable tasks to minimize the stretch of the tasks in an online non-clairvoyant setting. To the best of the authors' knowledge, this problem has never been studied before. To tackle this problem, first the sequential subproblem is studied through the lens of the approximation theory. An algorithm, called DASEDF, is proposed and, through simulations, it is shown to outperform the first-come, first-served scheme. Furthermore, it is observed that machine availability is the key to getting good stretch values. Then, the moldable task scheduling problem is considered, and, by leveraging the results from the sequential case, another algorithm, DBOS, is proposed to optimize the stretch while scheduling moldable tasks. This work is motivated by a task scheduling problem in the context of parallel short sequence mapping which has important applications in biology and genetics. The proposed DBOS algorithm is evaluated both on synthetic data sets that represent short sequence mapping requests and on data sets generated using log files of real production clusters. The results show that the DBOS algorithm significantly outperforms the two state-of-the-art task scheduling algorithms on stretch optimization.

[1]  Randeep Bhatia,et al.  Book review: Approximation Algorithms for NP-hard Problems. Edited by Dorit S. Hochbaum (PWS, 1997) , 1998, SIGA.

[2]  Uwe Schwiegelshohn,et al.  Theory and Practice in Parallel Job Scheduling , 1997, JSSPP.

[3]  edited by Jospeh Y-T. Leung,et al.  Handbook of scheduling , 2013 .

[4]  Georges Da Costa,et al.  2005 IEEE International Symposium on Cluster Computing and the Grid , 2005, CCGRID.

[5]  Philip S. Yu,et al.  Approximate algorithms scheduling parallelizable tasks , 1992, SPAA '92.

[7]  Dorit S. Hochbaum,et al.  Approximation Algorithms for NP-Hard Problems , 1996 .

[8]  Éva Tardos,et al.  Scheduling data transfers in a network and the set scheduling problem , 2003, J. Algorithms.

[9]  Klaus Jansen,et al.  Linear-Time Approximation Schemes for Scheduling Malleable Parallel Tasks , 1999, SODA '99.

[10]  Sriram Krishnamoorthy,et al.  A robust scheduling technology for moldable scheduling of parallel jobs , 2003, 2003 Proceedings IEEE International Conference on Cluster Computing.

[11]  Cynthia A. Phillips,et al.  Optimal Time-Critical Scheduling via Resource Augmentation , 1997, STOC '97.

[12]  Angela C. Sodan,et al.  Adaptive Job Scheduling Via Predictive Job Resource Allocation , 2006, JSSPP.

[13]  Ümit V. Çatalyürek,et al.  A Moldable Online Scheduling Algorithm and Its Application to Parallel Short Sequence Mapping , 2010, JSSPP.

[14]  Luca Becchetti,et al.  Average stretch without migration , 2004, J. Comput. Syst. Sci..

[15]  Ronald L. Graham,et al.  Bounds for certain multiprocessing anomalies , 1966 .

[16]  Mark J. Clement,et al.  Core Algorithms of the Maui Scheduler , 2001, JSSPP.

[17]  Mohammad Taghi Hajiaghayi,et al.  Scheduling to Minimize Staleness and Stretch in Real-Time Data Warehouses , 2009, SPAA '09.

[18]  Kirk Pruhs,et al.  Online scheduling , 2003 .

[19]  Victor C. S. Lee,et al.  Preemptive maximum stretch optimization scheduling for wireless on-demand data broadcast , 2004, Proceedings. International Database Engineering and Applications Symposium, 2004. IDEAS '04..

[20]  Catalin C. Barbacioru,et al.  Parallel short sequence mapping for high throughput genome sequencing , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[21]  Amitabh Sinha,et al.  Non-Clairvoyant Scheduling for Minimizing Mean Slowdown , 2003, Algorithmica.

[22]  P. Sadayappan,et al.  Moldable Parallel Job Scheduling Using Job Efficiency: An Iterative Approach , 2006, JSSPP.

[23]  Bala Kalyanasundaram,et al.  Speed is as powerful as clairvoyance , 2000, JACM.

[24]  Pierre-François Dutot,et al.  Bi-criteria algorithm for scheduling jobs on cluster platforms , 2004, SPAA '04.

[25]  P. Sadayappan,et al.  Selective Reservation Strategies for Backfill Job Scheduling , 2002, JSSPP.

[26]  Allen B. Downey,et al.  A parallel workload model and its implications for processor allocation , 1996, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[27]  E.L. Lawler,et al.  Optimization and Approximation in Deterministic Sequencing and Scheduling: a Survey , 1977 .

[28]  Rajmohan Rajaraman,et al.  Online scheduling to minimize average stretch , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[29]  Frédéric Vivien,et al.  Minimizing the stretch when scheduling flows of divisible requests , 2008, J. Sched..

[30]  Denis Trystram,et al.  A 3/2-Approximation Algorithm for Scheduling Independent Monotonic Malleable Tasks , 2007, SIAM J. Comput..

[31]  D. Atkin OR scheduling algorithms. , 2000, Anesthesiology.

[32]  Ashish Goel,et al.  Multi-processor scheduling to minimize flow time with ε resource augmentation , 2004, STOC '04.

[33]  Michael A. Bender,et al.  Flow and stretch metrics for scheduling continuous job streams , 1998, SODA '98.

[34]  P. Sadayappan,et al.  Effective Selection of Partition Sizes for Moldable Scheduling of Parallel Jobs , 2002, HiPC.