Machine Learning based SLA-Aware VNF Anomaly Detection for Virtual Network Management

Since the concept of Software-Defined Networking (SDN) and Network Function Virtualization (NFV) has been proposed, telcos and service providers have leveraged these concepts to provide their services more efficiently. However, as the virtual network in the data centers becomes more complex, a variety of new network management problems arise. To deal with these management problems, it is necessary to monitor and analyze resource usage and traffic load of Virtual Network Functions (VNFs) operating on the virtual network. Recently, there have been many attempts to develop technologies that enable network management without human intervention. In this paper, we specify our anomaly detection problem with scenarios involving SLA violations to satisfy the practical needs of network management. Also, we set the real-world NFV environment to generate anomalous data corresponding to each scenario and extend our approach to implementing the system for root-cause localization which identifies the exact VNF instance causing the SLA-related anomalies. We use the datasets collected from the VNFs’ service function chain scenarios implemented on OpenStack environment, and compare the accuracy of the anomaly detection models generated by various machine learning algorithms. Our experimental results show the best model has F1-measure over 95% for anomaly detection and 93% for root-cause localization.

[1]  Jing Chen,et al.  Matrix Differential Decomposition-Based Anomaly Detection and Localization in NFV Networks , 2019, IEEE Access.

[2]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[3]  Raouf Boutaba,et al.  A comprehensive survey on machine learning for networking: evolution, applications and research opportunities , 2018, Journal of Internet Services and Applications.

[4]  Vanish Talwar,et al.  Online detection of utility cloud anomalies using metric distributions , 2010, 2010 IEEE Network Operations and Management Symposium - NOMS 2010.

[5]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[6]  Mohsine Eleuldj,et al.  OpenStack: Toward an Open-source Solution for Cloud Computing , 2012 .

[7]  Danish Rafique,et al.  Machine learning for network automation: overview, architecture, and applications [Invited Tutorial] , 2018, IEEE/OSA Journal of Optical Communications and Networking.

[8]  Arshad Jhumka,et al.  Linking Resource Usage Anomalies with System Failures from Cluster Log Data , 2013, 2013 IEEE 32nd International Symposium on Reliable Distributed Systems.

[9]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[10]  Kahina Lazri,et al.  Anomaly Detection and Root Cause Localization in Virtual Network Functions , 2016, 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE).

[11]  Alois Knoll,et al.  Gradient boosting machines, a tutorial , 2013, Front. Neurorobot..

[12]  Yosra Ben Slimen,et al.  Root Cause Analysis of Noisy Neighbors in a Virtualized Infrastructure , 2020, 2020 IEEE Wireless Communications and Networking Conference (WCNC).

[13]  Arun Kejariwal,et al.  Automatic Anomaly Detection in the Cloud Via Statistical Learning , 2017, ArXiv.

[14]  Olivier Teytaud,et al.  Exact Distributed Training: Random Forest with Billions of Examples , 2018, ArXiv.

[15]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[16]  Claudia Szabo,et al.  Adaptive Performance Anomaly Detection in Distributed Systems Using Online SVMs , 2020, IEEE Transactions on Dependable and Secure Computing.

[17]  Yu He,et al.  Performance Anomaly Detection Models of Virtual Machines for Network Function Virtualization Infrastructure with Machine Learning , 2018, ICANN.

[18]  Xiaohui Gu,et al.  PREPARE: Predictive Performance Anomaly Prevention for Virtualized Cloud Systems , 2012, 2012 IEEE 32nd International Conference on Distributed Computing Systems.