RLScheduler: Learn to Schedule HPC Batch Jobs Using Deep Reinforcement Learning

We present RLScheduler, a deep reinforcement learning based job scheduler for scheduling independent batch jobs in high-performance computing (HPC) environment. From knowing nothing about scheduling at beginning, RLScheduler is able to autonomously learn how to effectively schedule HPC batch jobs, targeting a given optimization goal. This is achieved by deep reinforcement learning with the help of specially designed neural network structures and various optimizations to stabilize and accelerate the learning. Our results show that RLScheduler can outperform existing heuristic scheduling algorithms, including a manually fine-tuned machine learning-based scheduler on the same workload. More importantly, we show that RLScheduler does not blindly over-fit the given workload to achieve such optimization, instead, it learns general rules for scheduling batch jobs which can be further applied to different workloads and systems to achieve similarly optimized performance. We also demonstrate that RLScheduler is capable of adjusting itself along with changing goals and workloads, making it an attractive solution for the future autonomous HPC management.

[1]  Dror G. Feitelson,et al.  Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling , 2001, IEEE Trans. Parallel Distributed Syst..

[2]  Zhiling Lan,et al.  Fault-aware, utility-based job scheduling on Blue, Gene/P systems , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[3]  Dror G. Feitelson,et al.  The workload on parallel supercomputers: modeling the characteristics of rigid jobs , 2003, J. Parallel Distributed Comput..

[4]  G. Monahan State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 1982 .

[5]  Christodoulos A. Floudas,et al.  Mixed Integer Linear Programming in Process Scheduling: Modeling, Algorithms, and Applications , 2005, Ann. Oper. Res..

[6]  Dror G. Feitelson,et al.  Metrics for Parallel Job Scheduling and Their Convergence , 2001, JSSPP.

[7]  Hongzi Mao,et al.  Learning scheduling algorithms for data processing clusters , 2018, SIGCOMM.

[8]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[9]  Dan Tsafrir,et al.  Experience with using the Parallel Workloads Archive , 2014, J. Parallel Distributed Comput..

[10]  Jennifer M. Schopf,et al.  PBS Pro: Grid computing and scheduling attributes , 2004 .

[11]  Douglas G. Down,et al.  Power-Aware Linear Programming based Scheduling for heterogeneous computer clusters , 2010, International Conference on Green Computing.

[12]  Andy B. Yoo,et al.  Approved for Public Release; Further Dissemination Unlimited X-ray Pulse Compression Using Strained Crystals X-ray Pulse Compression Using Strained Crystals , 2002 .

[13]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[14]  Dario Amodei,et al.  An Empirical Model of Large-Batch Training , 2018, ArXiv.

[15]  Fatos Xhafa,et al.  Computational models and heuristic methods for Grid scheduling problems , 2010, Future Gener. Comput. Syst..

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[18]  H. Bourlard,et al.  Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[19]  Varghese S. Jacob,et al.  Heuristics and augmented neural networks for task scheduling with non-identical machines , 2006, Eur. J. Oper. Res..

[20]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[21]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[22]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[23]  Danilo Carastan-Santos,et al.  Obtaining Dynamic Scheduling Policies with Simulation and Machine Learning , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[24]  Yuedong Xu,et al.  Deep Reinforcement Learning for Multi-Resource Multi-Machine Job Scheduling , 2017, ArXiv.

[25]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[26]  Nirwan Ansari,et al.  A Genetic Algorithm for Multiprocessor Scheduling , 1994, IEEE Trans. Parallel Distributed Syst..

[27]  Uwe Schwiegelshohn,et al.  Theory and Practice in Parallel Job Scheduling , 1997, JSSPP.

[28]  Garrick Staples,et al.  TORQUE resource manager , 2006, SC.

[29]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[30]  Jeffrey D. Ullman,et al.  NP-Complete Scheduling Problems , 1975, J. Comput. Syst. Sci..

[31]  Derya Eren Akyol,et al.  A review on evolution of production scheduling with neural networks , 2007, Comput. Ind. Eng..

[32]  Srikanth Kandula,et al.  Resource Management with Deep Reinforcement Learning , 2016, HotNets.