DRL-Scheduling: An Intelligent QoS-Aware Job Scheduling Framework for Applications in Clouds

As an increasing number of traditional applications migrated to the cloud, achieving resource management and performance optimization in such a dynamic and uncertain environment becomes a big challenge for cloud-based application providers. In particular, job scheduling is a non-trivial task, which is responsible for allocating massive job requests submitted by users to the most suitable resources and satisfying user QoS requirements as much as possible. Inspired by recent success of using deep reinforcement learning techniques to solve AI control problems, in this paper, we propose an intelligent QoS-aware job scheduling framework for application providers. A deep reinforcement learning-based job scheduler is the key component of the framework. It is able to learn to make appropriate online job-to-VM decisions for continuous job requests directly from its experiences without any prior knowledge. Experimental results using synthetic workloads and real-world NASA workload traces show that compared with other baseline solutions, our proposed job scheduling approach can efficiently reduce average job response time (e.g., reduced by 40.4% compared with the best baseline for NASA traces), guarantee the QoS at a high level (e.g., job success rate is higher than 93% for all simulated changing workload scenarios), and adapt to different workload conditions.

[1]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[2]  Yi-Fang Lee,et al.  Distributed Scheduling Approach Based on Game Theory in the Federated Cloud , 2014, 2014 International Conference on Information Science & Applications (ICISA).

[3]  Yaoxue Zhang,et al.  Aggressive Resource Provisioning for Ensuring QoS in Virtualized Environments , 2015, IEEE Transactions on Cloud Computing.

[4]  Xiaodong Liu,et al.  Performance analysis of cloud computing services considering resources sharing among virtual machines , 2014, The Journal of Supercomputing.

[5]  Cheng-Zhong Xu,et al.  Coordinated Self-Configuration of Virtual Machines and Appliances Using a Model-Free Learning Approach , 2013, IEEE Transactions on Parallel and Distributed Systems.

[6]  Rajkumar Buyya,et al.  A Debt-Aware Learning Approach for Resource Adaptations in Cloud Elasticity Management , 2017, ICSOC.

[7]  Erol Gelenbe,et al.  Adaptive Dispatching of Tasks in the Cloud , 2015, IEEE Transactions on Cloud Computing.

[8]  Inderveer Chana,et al.  A Survey on Resource Scheduling in Cloud Computing: Issues and Challenges , 2016, Journal of Grid Computing.

[9]  Nicola Cordeschi,et al.  FUGE: A joint meta-heuristic approach to cloud job scheduling algorithm using fuzzy theory and a genetic method , 2014, Cluster Computing.

[10]  Yuxi Li,et al.  Deep Reinforcement Learning , 2018, Reinforcement Learning for Cyber-Physical Systems.

[11]  Qinru Qiu,et al.  A Hierarchical Framework of Cloud Resource Allocation and Power Management Using Deep Reinforcement Learning , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[12]  Sam Jabbehdari,et al.  An autonomic resource provisioning approach for service-based cloud applications: A hybrid approach , 2018, Future Gener. Comput. Syst..

[13]  Pasi Liljeberg,et al.  Energy-Efficient Virtual Machines Consolidation in Cloud Data Centers Using Reinforcement Learning , 2014, 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[14]  Salvatore Venticinque,et al.  A distributed scheduling framework based on selfish autonomous agents for federated cloud environments , 2013, Future Gener. Comput. Syst..

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  KARTHIKEYAN KRISHNASAMY,et al.  TASK SCHEDULING ALGORITHM BASED ON HYBRID PARTICLE SWARM OPTIMIZATION IN CLOUD COMPUTING ENVIRONMENT , 2013 .

[17]  Claus Pahl,et al.  A Comparison of Reinforcement Learning Techniques for Fuzzy Cloud Auto-Scaling , 2017, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[18]  Rajkumar Buyya,et al.  SLA-based virtual machine management for heterogeneous workloads in a cloud datacenter , 2014, J. Netw. Comput. Appl..

[19]  Jelena V. Misic,et al.  Performance Analysis of Cloud Computing Centers Using M/G/m/m+r Queuing Systems , 2012, IEEE Transactions on Parallel and Distributed Systems.

[20]  Xifeng Yan,et al.  Workload characterization and prediction in the cloud: A multiple time series approach , 2012, 2012 IEEE Network Operations and Management Symposium.

[21]  Jie Li,et al.  Cloud auto-scaling with deadline and budget constraints , 2010, 2010 11th IEEE/ACM International Conference on Grid Computing.

[22]  Uwe Schwiegelshohn,et al.  Towards Understanding Uncertainty in Cloud Computing Resource Provisioning , 2015, ICCS.

[23]  Weiwei Lin,et al.  Random task scheduling scheme based on reinforcement learning in cloud computing , 2015, Cluster Computing.

[24]  MengChu Zhou,et al.  TTSA: An Effective Scheduling Approach for Delay Bounded Tasks in Hybrid Clouds , 2017, IEEE Transactions on Cybernetics.

[25]  Rajkumar Buyya,et al.  SLA-based admission control for a Software-as-a-Service provider in Cloud computing environments , 2012, J. Comput. Syst. Sci..

[26]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[27]  Ji Li,et al.  DRL-cloud: Deep reinforcement learning-based resource provisioning and task scheduling for cloud service providers , 2018, 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC).

[28]  John Jose,et al.  Study and analysis of various task scheduling algorithms in the cloud computing environment , 2014, 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[29]  Marty Humphrey,et al.  Auto-scaling to minimize cost and meet application deadlines in cloud workflows , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[30]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[31]  Tae Young Kim,et al.  The Study of Genetic Algorithm-based Task Scheduling for Cloud Computing , 2012 .

[32]  Ajith Abraham,et al.  Scheduling Jobs on Computational Grids Using Fuzzy Particle Swarm Algorithm , 2006, KES.

[33]  Medhat A. Tawfeek,et al.  Cloud task scheduling based on ant colony optimization , 2013, 2013 8th International Conference on Computer Engineering & Systems (ICCES).

[34]  Rajkumar Buyya,et al.  SLA-Based Resource Provisioning for Heterogeneous Workloads in a Virtualized Cloud Datacenter , 2011, ICA3PP.

[35]  Marco Abundo,et al.  QoS-aware bidding strategies for VM spot instances: A reinforcement learning approach applied to periodic long running jobs , 2015, 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM).

[36]  Imtiaz Ahmad,et al.  Cloud Computing Pricing Models: A Survey , 2013 .

[37]  Sarbjeet Singh,et al.  A review of metaheuristic scheduling techniques in cloud computing , 2015 .