Abstract Virtualization as a key IT technology has developed to a predominant model in data centers in recent years. The flexibility regarding scaling-out and migration of virtual machines for seamless maintenance has enabled a new level of continuous operation and changed service provisioning significantly. Meanwhile, services from domains striving for highest possible availability – e.g. from the telecommunications domain – are adopting this approach as well and are investing significant efforts into the development of Network Function Virtualization (NFV). However, the availability requirements for such infrastructures are much higher than typical for IT services built upon standard software with off-the-shelf hardware. They require sophisticated methods and mechanisms for fast detection and recovery of failures. This paper presents a set of methods and an implemented prototype for anomaly detection in cloud-based infrastructures with specific focus on the deployment of virtualized network functions. The framework is built upon OpenStack, which is the current de-facto standard of open-source cloud software and aims at increasing the availability and fault tolerance level by providing an extensive monitoring and analysis pipeline able to detect failures or degraded performance in real-time. The indicators for anomalies are created using supervised and non-supervised classification methods and preliminary experimental measurements showed a high percentage of correctly identified anomaly situations. After a successful failure detection, a set of pre-defined countermeasures is activated in order to mask or repair outages or situations with degraded performance.
[1]
Ewa Deelman,et al.
Anomaly detection for scientific workflow applications on networked clouds
,
2016,
2016 International Conference on High Performance Computing & Simulation (HPCS).
[2]
Dimiter R. Avresky,et al.
A Machine Learning-Based Framework for Building Application Failure Prediction Models
,
2015,
2015 IEEE International Parallel and Distributed Processing Symposium Workshop.
[3]
Lin Yang,et al.
LOGAN: Problem Diagnosis in the Cloud Using Log-Based Reference Models
,
2016,
2016 IEEE International Conference on Cloud Engineering (IC2E).
[4]
Lluís A. Belanche Muñoz,et al.
Predicting Software Anomalies Using Machine Learning Techniques
,
2011,
2011 IEEE 10th International Symposium on Network Computing and Applications.