Hybrid CAE-VAE for Unsupervised Anomaly Detection in Log File Systems

Anomaly detection is of paramount importance especially in big data systems since these systems log abruptly changing events which generate consequential outliers in their logs. These logs are highly unstructured in nature, hence traditional machine learning methods fail to detect anomalies. Prominent approaches include supervised techniques which require labelled data for their operation and unsupervised techniques that rely on some error metric. Also supervised methods can only capture anomalies present in the dataset, such an approach fails for any new type of anomaly. Hence, the need for unsupervised learning techniques with an easy to interpret anomaly score arises. In this paper, we propose a solution utilizing a hybrid Convolutional Autoencoder-Variational Autoencoder (CAE-VAE) architecture for discrete event sequences which are obtained by processing log files using log keys derived from individual entries. We evaluate our model on Hadoop Distributed File System (HDFS) logs. Unlike most traditional Autoencoder approaches utilizing reconstruction error for anomaly detection, our proposed model derives a likelihood metric which can be interpreted as an anomaly score. We also present a comparative analysis of our models with a supervised CNN model and an unsupervised CAE model and prove empirically how our model gets better results.

[1]  Daniel P. Siewiorek,et al.  Error log analysis: statistical modeling and heuristic trend analysis , 1990 .

[2]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[3]  Feifei Li,et al.  DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning , 2017, CCS.

[4]  Luis A. Trejo,et al.  Analyzing Log Files for Postmortem Intrusion Detection , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[5]  Tie Luo,et al.  Distributed Anomaly Detection Using Autoencoder Neural Networks in WSN for IoT , 2018, 2018 IEEE International Conference on Communications (ICC).

[6]  Hichem Snoussi,et al.  Generative Neural Networks for Anomaly Detection in Crowded Scenes , 2019, IEEE Transactions on Information Forensics and Security.

[7]  Liming Zhu,et al.  Non-Intrusive Anomaly Detection With Streaming Performance Metrics and Logs for DevOps in Public Clouds: A Case Study in AWS , 2016, IEEE Transactions on Emerging Topics in Computing.

[8]  Michael I. Jordan,et al.  Detecting large-scale system problems by mining console logs , 2009, SOSP '09.

[9]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[10]  Charles C. Kemp,et al.  A Multimodal Anomaly Detector for Robot-Assisted Feeding Using an LSTM-Based Variational Autoencoder , 2017, IEEE Robotics and Automation Letters.

[11]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[12]  Sven Behnke,et al.  Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition , 2010, ICANN.

[13]  Jürgen Schmidhuber,et al.  Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.

[14]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[16]  J. Geweke,et al.  Bayesian Inference in Econometric Models Using Monte Carlo Integration , 1989 .

[17]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[18]  Xiang Wei,et al.  Detecting Anomaly in Big Data System Logs Using Convolutional Neural Network , 2018, 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech).

[19]  Yu Cheng,et al.  Unsupervised Sequential Outlier Detection With Deep Architectures , 2017, IEEE Transactions on Image Processing.

[20]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .