A Novel Anomaly Detection Algorithm Based on Trident Tree

In this paper, we propose a novel anomaly detection algorithm, named T-Forest, which is implemented by multiple trident trees (T-trees). Each T-tree is constructed recursively by isolating the data outside of 3 sigma into the left and right subtree and isolating the others into the middle subtree, and each node in a T-tree records the size of datasets that falls on this node, so that each T-tree can be used as a local density estimator for data points. The density value for each instance is the average of all trees evaluation instance densities, and it can be used as the anomaly score of the instance. Since each T-tree is constructed according to 3 sigma principle, each tree in TB-Forest can obtain good anomaly detection results without a large tree height. Compared with some state-of-the-art methods, our algorithm performs well in AUC value, and needs linear time complexity and space complexity. The experimental results show that our approach can not only effectively detect anomaly points, but also tend to converge within a certain parameters range.

[1]  Sameer Sharma,et al.  Cloud Analytics for Wireless Metric Prediction - Framework and Performance , 2015, 2015 IEEE 8th International Conference on Cloud Computing.

[2]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[3]  James Kempf,et al.  Cloud Atlas: A Software Defined Networking Abstraction for Cloud to WAN Virtual Networking , 2013, 2013 IEEE Sixth International Conference on Cloud Computing.

[4]  Mahsa Salehi,et al.  Fast Memory Efficient Local Outlier Detection in Data Streams (Extended Abstract) , 2016, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[5]  Bianca Zadrozny,et al.  Outlier detection by active learning , 2006, KDD '06.

[6]  Kai Ming Ting,et al.  Fast Anomaly Detection for Streaming Data , 2011, IJCAI.

[7]  Lei Cao,et al.  Distributed Local Outlier Detection in Big Data , 2017, KDD.

[8]  Siyuan Liu,et al.  Anomaly Detection from Incomplete Data , 2014, TKDD.

[9]  Dan Ma,et al.  On the Financification of Cloud Computing: An Agenda for Pricing and Service Delivery Mechanism Design Research , 2014, CloudCom 2014.

[10]  Fei Tony Liu,et al.  Isolation-Based Anomaly Detection , 2012, TKDD.

[11]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[12]  Graham J. Williams,et al.  On-Line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms , 2000, KDD '00.

[13]  Zengyou He,et al.  Discovering cluster-based local outliers , 2003, Pattern Recognit. Lett..

[14]  Osmar R. Zaïane,et al.  A Nonparametric Outlier Detection for Effectively Discovering Top-N Outliers from Engineering Data , 2006, PAKDD.

[15]  Hans-Peter Kriegel,et al.  Angle-based outlier detection in high-dimensional data , 2008, KDD.

[16]  Biming Tian,et al.  Anomaly detection in wireless sensor networks: A survey , 2011, J. Netw. Comput. Appl..

[17]  Pierfrancesco Bellini,et al.  A Knowledge Base Driven Solution for Smart Cloud Management , 2015, 2015 IEEE 8th International Conference on Cloud Computing.

[18]  S. Horvath,et al.  Unsupervised Learning With Random Forest Predictors , 2006 .

[19]  Junliang Chen,et al.  ODDC: Outlier Detection Using Distance Distribution Clustering , 2007, PAKDD Workshops.

[20]  Ming-Jen Huang,et al.  INTELLIGENT SOFTWARE-DEFINED STORAGE WITH DEEP TRAFFIC MODELING FOR CLOUD STORAGE SERVICE , 2016 .

[21]  Philip S. Yu,et al.  RS-Forest: A Rapid Density Estimator for Streaming Anomaly Detection , 2014, 2014 IEEE International Conference on Data Mining.

[22]  Jiawei Han,et al.  Filtering and Refinement: A Two-Stage Approach for Efficient and Effective Anomaly Detection , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[23]  Gang Huang,et al.  Towards a Model-Defined Cloud-of-Clouds , 2015, 2015 IEEE Conference on Collaboration and Internet Computing (CIC).