Research and Improvement of Isolation Forest in Detection of Local Anomaly Points

Three algorithms of classification-based, density-based, and isolation-based are researched and compared in this paper. It is concluded that Isolation Forest algorithm has characteristics of low time complexity and quantitative description of anomalies, which is obviously superior to other algorithms. However, it has disadvantage in detecting local anomaly point, which affects the accuracy of algorithm. Therefore, an improved algorithm based on Isolation Forest is proposed, of which the main idea is the K-means algorithm divides samples into different clusters, and the local anomalies before clustering are transformed into global anomalies of adjacent clusters, and finally the anomaly scores of the samples are calculated in each cluster. Experimental results are that the improved algorithm is better than Isolation Forest algorithm in detecting local anomaly points.

[1]  Kenji Yamanishi,et al.  Discovering outlier filtering rules from unlabeled data: combining a supervised learner with an unsupervised learner , 2001, KDD '01.

[2]  Douglas M. Hawkins Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.

[3]  M. Narasimha Murty,et al.  Genetic K-means algorithm , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[4]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[5]  J. Ma,et al.  Time-series novelty detection using one-class support vector machines , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[6]  Fei Tony Liu,et al.  Isolation-Based Anomaly Detection , 2012, TKDD.

[7]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.