RARE: a labeled dataset for cloud-native memory anomalies

Anomaly detection has been attracting interest from both the industry and the research community for many years, as the number of published papers and services adopted grew exponentially over the last decade. One of the reasons behind this is the wide adoption of cloud systems from the majority of players in multiple industries, such as online shopping, advertisement or remote computing. In this work we propose a Dataset foR cloud-nAtive memoRy anomaliEs: RARE. It includes labelled anomaly time-series data, comprising of over 900 unique metrics. This dataset has been generated using a microservice for injecting artificial byte stream in order to overload the nodes, provoking memory anomalies, which in some cases resulted in a crash. The system was built using a Kafka server deployed on a Kubernetes system. Moreover, in order to get access and download the metrics related to the server, we utilised Prometheus. In this paper we present a dataset that can be used coupled with machine learning algorithms for detecting anomalies in a cloud based system. The dataset will be available in the form of CSV file through an online repository. Moreover, we also included an example of application using a Random Forest algorithm for classifying the data as anomalous or not. The goal of the RARE dataset is to help in the development of more accurate and reliable machine learning methods for anomaly detection in cloud based systems.

[1]  Mauro Pezzè,et al.  An RBM Anomaly Detector for the Cloud , 2019, 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST).

[2]  Valentina Lenarduzzi,et al.  Software Quality for AI: Where We Are Now? , 2021, SWQD.

[3]  Shi Jin,et al.  Changepoint-based anomaly detection in a core router system , 2017, 2017 IEEE International Test Conference (ITC).

[4]  Subutai Ahmad,et al.  Evaluating Real-Time Anomaly Detection Algorithms -- The Numenta Anomaly Benchmark , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[5]  Matthias Sax,et al.  Apache Kafka , 2019, Encyclopedia of Big Data Technologies.

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  Elana Hashman Operating within Normal Parameters: Monitoring Kubernetes , 2019 .

[8]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[9]  Xin Chen,et al.  Failure Analysis of Jobs in Compute Clouds: A Google Cluster Case Study , 2014, 2014 IEEE 25th International Symposium on Software Reliability Engineering.