Continuous self‐adaptation of control policies in automatic cloud management

Deep reinforcement learning has been recently a very active field of research. The policies generated with the use of this class of training algorithms are flexible and thus have many practical applications. In this article we present the results of our attempt to use the recent advancements in reinforcement learning to automate the management of resources in a compute cloud environment. We describe a new approach to self‐adaptation of autonomous management, which uses a digital clone of the managed infrastructure to continuously update the control policy. We present the architecture of our system and discuss the results of evaluation which includes autonomous management of a sample application deployed to Amazon Web Services cloud. We also provide the details of the training of the management policy using the Proximal Policy Optimization algorithm. Finally, we discuss the feasibility to extend the presented approach to further scenarios.

[1]  Bruno Volckaert,et al.  Resource Provisioning in Fog Computing through Deep Reinforcement Learning , 2021, 2021 IFIP/IEEE International Symposium on Integrated Network Management (IM).

[2]  Kotagiri Ramamohanarao,et al.  ADRL: A Hybrid Anomaly-Aware Deep Reinforcement Learning-Based Resource Scaling in Clouds , 2021, IEEE Transactions on Parallel and Distributed Systems.

[3]  Wenxia Guo,et al.  Cloud Resource Scheduling With Deep Reinforcement Learning and Imitation Learning , 2021, IEEE Internet of Things Journal.

[4]  Massimo Paolucci,et al.  Deep reinforcement learning for multi-objective placement of virtual machines in cloud datacenters , 2020, Soft Computing.

[5]  John Schulman,et al.  Phasic Policy Gradient , 2020, ICML.

[6]  F. Richard Yu,et al.  Resource Optimization for Delay-Tolerant Data in Blockchain-Enabled IoT With Edge Computing: A Deep Reinforcement Learning Approach , 2020, IEEE Internet of Things Journal.

[7]  Jacek Kitowski,et al.  Automatic Management of Cloud Applications with Use of Proximal Policy Optimization , 2020, ICCS.

[8]  Jason Yon,et al.  Characterising the Digital Twin: A systematic literature review , 2020, CIRP Journal of Manufacturing Science and Technology.

[9]  Cristian Mateos,et al.  Reinforcement learning-based application Autoscaling in the Cloud: A survey , 2020, Eng. Appl. Artif. Intell..

[10]  Zhiping Peng,et al.  A multi-objective trade-off framework for cloud resource scheduling based on the Deep Q-network algorithm , 2020, Cluster Computing.

[11]  Tomasz Bednarz,et al.  HoloCity – exploring the use of augmented reality cityscapes for collaborative understanding of high-volume urban sensor data , 2019, VRCAI.

[12]  Daniela Fogli,et al.  A Survey on Digital Twin: Definitions, Characteristics, Applications, and Design Implications , 2019, IEEE Access.

[13]  Ioannis Konstantinou,et al.  DERP: A Deep Reinforcement Learning Cloud System for Elastic Resource Provisioning , 2018, 2018 IEEE International Conference on Cloud Computing Technology and Science (CloudCom).

[14]  Ulrich Eberle,et al.  Simulation-Based Identification of Critical Scenarios for Cooperative and Automated Vehicles , 2018 .

[15]  Fei Tao,et al.  Digital twin-driven product design, manufacturing and service with big data , 2017, The International Journal of Advanced Manufacturing Technology.

[16]  Ji Li,et al.  DRL-cloud: Deep reinforcement learning-based resource provisioning and task scheduling for cloud service providers , 2018, 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC).

[17]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[18]  Tim Oates,et al.  Automated Cloud Provisioning on AWS using Deep Reinforcement Learning , 2017, ArXiv.

[19]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[20]  Qinru Qiu,et al.  A Hierarchical Framework of Cloud Resource Allocation and Power Management Using Deep Reinforcement Learning , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[21]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[23]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[24]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[25]  Sepp Hochreiter,et al.  The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[26]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[27]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[28]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[29]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[30]  Hongfeng Sun,et al.  A Deep Reinforcement Learning Based Resource Autonomic Provisioning Approach for Cloud Services , 2020, CollaborateCom.

[31]  Pawel Koperek,et al.  Evaluating the Use of Policy Gradient Optimization Approach for Automatic Cloud Resource Provisioning , 2019, PPAM.

[32]  Wlodzimierz Funika,et al.  Towards Autonomic Semantic-Based Management of Distributed Applications , 2010, Comput. Sci..

[33]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .