Real-Time Anomaly Detection of NoSQL Systems Based on Resource Usage Monitoring

Today, the emergence of the industry revolution systems such as Industry 4.0, Internet of Things, and big data frameworks poses new challenges in terms of storage and processing of real-time data. As systems scale in humongous sizes, a crucial task is to administer the variety of different subsystems and applications to ensure high performance. This is directly related with the identification and elimination of system failures and errors, while the system runs. In particular, database systems may experience abnormalities related with decreased throughput or increased resource usage, that in turn affects system performance. In this article, we focus on not only SQL (NoSQL) database systems that are ideal for storing sensor data in the concept of Industry 4.0. This typically includes a variety of applications and workloads that are difficult to online monitor, thus making anomaly detection a challenging task. Creating a robust platform to serve such infrastructures with minimum hardware or software failures is a key challenge. In this article, we propose RADAR, an anomaly detection system that works on real time. RADAR is a data-driven decision-making system for NoSQL systems, by providing process information extraction during resource monitoring and by associating resource usage with the top processes, to identify anomalous cases. In this article, we focus on anomalies such as hardware failures or software bugs that could lead to abnormal application runs, without necessarily stopping system functionality, e.g., due to a system crash, but by affecting its performance, e.g., decreased database system throughput. Although different patterns may occur through time, we focus on periodic running workloads (e.g., monitoring daily usage) that are very common for NoSQL systems, and Internet of Things scenarios where data streams are forwarded to the Cloud for storage and processing. We apply various machine learning algorithms such as autoregressive integrated moving average (ARIMA), seasonal ARIMA, and long–short-term memory recurrent neural networks. We experimentally analyze our solution to demonstrate the benefits of supporting online erroneous state identification and characterization for modern applications.

[1]  Arun Kejariwal,et al.  A Novel Technique for Long-Term Anomaly Detection in the Cloud , 2014, HotCloud.

[2]  Rajkumar Buyya,et al.  Energy Efficient Resource Management in Virtualized Cloud Data Centers , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[3]  Vijay V. Raghavan,et al.  NoSQL Systems for Big Data Management , 2014, 2014 IEEE World Congress on Services.

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Ling Huang,et al.  In-Network PCA and Anomaly Detection , 2006, NIPS.

[6]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[7]  Ying-Wong Cheung,et al.  Lag Order and Critical Values of the Augmented Dickey-Fuller Test , 1995 .

[8]  Lon-Mu Liu,et al.  FORECASTING AND TIME SERIES ANALYSIS USING THE SCA STATISTICAL SYSTEM , 1994 .

[9]  Rajkumar Buyya,et al.  Elastic Load Balancing for Dynamic Virtual Machine Reconfiguration Based on Vertical and Horizontal Scaling , 2019, IEEE Transactions on Services Computing.

[10]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[11]  K. Piromsopa,et al.  SARIMA based network bandwidth anomaly detection , 2012, 2012 Ninth International Conference on Computer Science and Software Engineering (JCSSE).

[12]  Arun Kejariwal,et al.  Automatic Anomaly Detection in the Cloud Via Statistical Learning , 2017, ArXiv.

[13]  Stelios Sotiriadis,et al.  Semantic Aware Online Detection of Resource Anomalies on the Cloud , 2016, 2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom).

[14]  Athanasios V. Vasilakos,et al.  GreenDCN: A General Framework for Achieving Energy Efficiency in Data Center Networks , 2013, IEEE Journal on Selected Areas in Communications.

[15]  Enda Barrett,et al.  CPU workload forecasting of machines in data centers using LSTM recurrent neural networks and ARIMA models , 2017, 2017 12th International Conference for Internet Technology and Secured Transactions (ICITST).

[16]  Enrique V. Carrera,et al.  Load balancing and unbalancing for power and performance in cluster-based systems , 2001 .

[17]  Tal Garfinkel,et al.  A Virtual Machine Introspection Based Architecture for Intrusion Detection , 2003, NDSS.

[18]  Stelios Sotiriadis,et al.  Online Phase Detection and Characterization of Cloud Applications , 2017, 2017 IEEE International Conference on Cloud Computing Technology and Science (CloudCom).