Predictive auto-scaling with OpenStack Monasca

Cloud auto-scaling mechanisms are typically based on reactive automation rules that scale a cluster whenever some metric, e.g., the average CPU usage among instances, exceeds a predefined threshold. Tuning these rules becomes particularly cumbersome when scaling-up a cluster involves non-negligible times to bootstrap new instances, as it happens frequently in production cloud services. To deal with this problem, we propose an architecture for auto-scaling cloud services based on the status in which the system is expected to evolve in the near future. Our approach leverages on time-series forecasting techniques, like those based on machine learning and artificial neural networks, to predict the future dynamics of key metrics, e.g., resource consumption metrics, and apply a threshold-based scaling policy on them. The result is a predictive automation policy that is able, for instance, to automatically anticipate peaks in the load of a cloud application and trigger ahead of time appropriate scaling actions to accommodate the expected increase in traffic. We prototyped our approach as an open-source OpenStack component, which relies on, and extends, the monitoring capabilities offered by Monasca, resulting in the addition of predictive metrics that can be leveraged by orchestration components like Heat or Senlin. We show experimental results using a recurrent neural network and a multi-layer perceptron as predictor, which are compared with a simple linear regression and a traditional non-predictive auto-scaling policy. However, the proposed framework allows for the easy customization of the prediction policy as needed.

[1]  Thomas Magedanz,et al.  An extensible Autoscaling Engine (AE) for Software-based Network Functions , 2016, 2016 IEEE Conference on Network Function Virtualization and Software Defined Networks (NFV-SDN).

[2]  P. F. Kauff Group , 2000, Elegant Design.

[3]  Christoph Hochreiner,et al.  Predicting Cloud Resource Utilization , 2016, 2016 IEEE/ACM 9th International Conference on Utility and Cloud Computing (UCC).

[4]  Miao Zhao,et al.  NFVdeep , 2019, Proceedings of the International Symposium on Quality of Service.

[5]  P. Mell,et al.  The NIST Definition of Cloud Computing , 2011 .

[6]  Fei Li,et al.  Efficient Auto-Scaling Approach in the Telco Cloud Using Self-Learning Algorithm , 2014, 2015 IEEE Global Communications Conference (GLOBECOM).

[7]  Rajkumar Buyya,et al.  Cloud Computing Principles and Paradigms , 2011 .

[8]  J. Yosinski,et al.  Time-series Extreme Event Forecasting with Neural Networks at Uber , 2017 .

[9]  Marco Vannucci,et al.  Forecasting Operation Metrics for Virtualized Network Functions , 2021, 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid).

[10]  Kevin Lee,et al.  Empirical prediction models for adaptive resource provisioning in the cloud , 2012, Future Gener. Comput. Syst..

[11]  Rolf Stadler,et al.  vNMF: Distributed fault detection using clustering approach for network function virtualization , 2015, 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM).

[12]  Fulvio Risso,et al.  An adaptive scaling mechanism for managing performance variations in network functions virtualization: A case study in an NFV-based EPC , 2017, 2017 13th International Conference on Network and Service Management (CNSM).

[13]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[14]  Biswanath Mukherjee,et al.  Auto-Scaling VNFs Using Machine Learning to Improve QoS and Reduce Cost , 2018, 2018 IEEE International Conference on Communications (ICC).

[15]  Germain Forestier,et al.  Deep learning for time series classification: a review , 2018, Data Mining and Knowledge Discovery.

[16]  Matthias W. Seeger,et al.  Deep State Space Models for Time Series Forecasting , 2018, NeurIPS.

[17]  Hai Jin,et al.  Adaptive VNF Scaling and Flow Routing with Proactive Demand Prediction , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[18]  Palden Lama,et al.  Robust Resource Scaling of Containerized Microservices with Probabilistic Machine learning , 2020, 2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC).

[19]  Aniruddha S. Gokhale,et al.  Efficient Autoscaling in the Cloud Using Predictive Models for Workload Forecasting , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[20]  Jiaxing Zhang,et al.  NFVdeep: Adaptive Online Service Function Chain Deployment with Deep Reinforcement Learning , 2019, 2019 IEEE/ACM 27th International Symposium on Quality of Service (IWQoS).

[21]  Josef Spillner,et al.  Experimental Evaluation of the Cloud-Native Application Design , 2015, 2015 IEEE/ACM 8th International Conference on Utility and Cloud Computing (UCC).

[22]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[23]  Rolf Stadler,et al.  Universal fault detection for NFV using SOM-based clustering , 2015, 2015 17th Asia-Pacific Network Operations and Management Symposium (APNOMS).

[24]  Sabidur Rahman,et al.  Novel Approaches for VNF Requirement Prediction Using DNN and LSTM , 2019, 2019 IEEE Global Communications Conference (GLOBECOM).

[25]  Cristina Boeres,et al.  Managing Vertical Memory Elasticity in Containers , 2020, 2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC).

[26]  Abul Bashar,et al.  Autonomic scaling of Cloud Computing resources using BN-based prediction models , 2013, 2013 IEEE 2nd International Conference on Cloud Networking (CloudNet).

[27]  Raouf Boutaba,et al.  Topology-Aware Prediction of Virtual Network Function Resource Requirements , 2017, IEEE Transactions on Network and Service Management.

[28]  Alexander Kmentt 2017 , 2018, The Treaty Prohibiting Nuclear Weapons.

[29]  Claus Pahl,et al.  A Comparison of Reinforcement Learning Techniques for Fuzzy Cloud Auto-Scaling , 2017, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[30]  David Wood,et al.  Policy-based NFV management and orchestration , 2015, 2015 IEEE Conference on Network Function Virtualization and Software Defined Network (NFV-SDN).

[31]  Lovekesh Vig,et al.  TimeNet: Pre-trained deep recurrent neural network for time series classification , 2017, ESANN.