Outlier detection using isolation forest and local outlier factor

Outlier detection, also named as anomaly detection, is one of the hot issues in the field of data mining. As well-known outlier detection algorithms, Isolation Forest(iForest) and Local Outlier Factor(LOF) have been widely used. However, iForest is only sensitive to global outliers, and is weak in dealing with local outliers. Although LOF performs well in local outlier detection, it has high time complexity. To overcome the weaknesses of iForest and LOF, a two-layer progressive ensemble method for outlier detection is proposed. It can accurately detect outliers in complex datasets with low time complexity. This method first utilizes iForest with low complexity to quickly scan the dataset, prunes the apparently normal data, and generates an outlier candidate set. In order to further improve the pruning accuracy, the outlier coefficient is introduced to design a pruning threshold setting method, which is based on outlier degree of data. Then LOF is applied to further distinguish the outlier candidate set and get more accurate outliers. The proposed ensemble method takes advantage of the two algorithms and concentrates valuable computing resources on the key stage. Finally, a large number of experiments are carried out to verify the ensemble method. The results show that compared with the existing methods, the ensemble method can significantly improve the outlier detection rate and greatly reduce the time complexity.

[1]  Christopher Leckie,et al.  R1SVM: A Randomised Nonlinear Approach to Large-Scale Anomaly Detection , 2015, AAAI.

[2]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[3]  Bin Luo,et al.  Entropy Isolation Forest Based on Dimension Entropy for Anomaly Detection , 2018 .

[4]  Lei Cao,et al.  Scalable Top-n Local Outlier Detection , 2017, KDD.

[5]  Henry Y. T. Ngan,et al.  Traffic outlier detection by density-based bounded local outlier factors , 2016 .

[6]  Sergei Vassilvitskii,et al.  Local Search Methods for k-Means with Outliers , 2017, Proc. VLDB Endow..

[7]  Karl Andersson,et al.  A novel anomaly detection algorithm for sensor data under uncertainty , 2016, Soft Computing.

[8]  Prabha Verma,et al.  Fuzzy c-means clustering based outlier detection for SAW electronic nose , 2017, 2017 2nd International Conference for Convergence in Technology (I2CT).

[9]  Bing Tu,et al.  Hyperspectral Imagery Noisy Label Detection by Spectral Angle Local Outlier Factor , 2018, IEEE Geoscience and Remote Sensing Letters.

[10]  Fei Tony Liu,et al.  Isolation-Based Anomaly Detection , 2012, TKDD.

[11]  Riza Atiq Abdullah O.K. Rahmat,et al.  Development of E-ACTIVETRANS for young professional planners/engineers , 2017 .

[12]  Ling Chen,et al.  Learning Representations of Ultrahigh-dimensional Data for Random Distance-based Outlier Detection , 2018, KDD.

[13]  Stéphan Clémençon,et al.  Functional Isolation Forest , 2019, ACML.

[14]  Ejaz Ahmed,et al.  Real-time big data processing for anomaly detection: A Survey , 2019, Int. J. Inf. Manag..

[15]  Petr Savický,et al.  Softening Splits in Decision Trees Using Simulated Annealing , 2007, ICANNGA.

[16]  David Sasaki,et al.  Creation of knowledge‐based planning models intended for large scale distribution: Minimizing the effect of outlier plans , 2018, Journal of applied clinical medical physics.

[17]  Tao Qin,et al.  An Integrated Method for Anomaly Detection From Massive System Logs , 2018, IEEE Access.

[18]  Yu-Ru Lin,et al.  Deep into Hypersphere: Robust and Unsupervised Anomaly Discovery in Dynamic Networks , 2018, IJCAI.