An Optimized Computational Framework for Isolation Forest

Isolation Forest or iForest is one of the outstanding outlier detectors proposed in recent years. Yet, in the model setting, it is mainly based on the technique of randomization and, as a result, it is not clear how to select a proper attribute and how to locate an optimized split point on a given attribute while building the isolation tree. Aiming to the two issues, we propose an improved computational framework which allows us to seek the most separable attributes and spot corresponding optimized split points effectively. According to the experimental results, the proposed model is able to achieve overall better performance in the accuracy of outlier detection compared with the original model and its related variants.

[1]  Ke Zhang,et al.  A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data , 2009, PAKDD.

[2]  Zhi-Hua Zhou,et al.  On Detecting Clustered Anomalies Using SCiForest , 2010, ECML/PKDD.

[3]  Jae-Gil Lee,et al.  Trajectory Outlier Detection: A Partition-and-Detect Framework , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[4]  Fei Tony Liu,et al.  Isolation-Based Anomaly Detection , 2012, TKDD.

[5]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[6]  Chang-Tien Lu,et al.  Outlier Detection , 2008, Encyclopedia of GIS.

[7]  Aleksandar Lazarevic,et al.  Incremental Local Outlier Detection for Data Streams , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[8]  Vipin Kumar,et al.  Anomaly Detection for Discrete Sequences: A Survey , 2012, IEEE Transactions on Knowledge and Data Engineering.

[9]  Hans-Peter Kriegel,et al.  Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data , 2009, PAKDD.

[10]  Clara Pizzuti,et al.  Fast Outlier Detection in High Dimensional Spaces , 2002, PKDD.

[11]  Hans-Peter Kriegel,et al.  Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection , 2012, Data Mining and Knowledge Discovery.

[12]  Arthur Zimek,et al.  On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study , 2016, Data Mining and Knowledge Discovery.

[13]  Stefan Berchtold,et al.  Efficient Biased Sampling for Approximate Clustering and Outlier Detection in Large Data Sets , 2003, IEEE Trans. Knowl. Data Eng..

[14]  Jayanta Basak,et al.  Interpretable hierarchical clustering by constructing an unsupervised decision tree , 2005, IEEE Transactions on Knowledge and Data Engineering.

[15]  Anthony K. H. Tung,et al.  Ranking Outliers Using Symmetric Neighborhood Relationship , 2006, PAKDD.

[16]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[17]  Raymond T. Ng,et al.  A Unified Notion of Outliers: Properties and Computation , 1997, KDD.

[18]  Yizhou Sun,et al.  On community outliers and their efficient detection in information networks , 2010, KDD.

[19]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[20]  Klemens Böhm,et al.  HiCS: High Contrast Subspaces for Density-Based Outlier Ranking , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[21]  Vivekanand Gopalkrishnan,et al.  Efficient Pruning Schemes for Distance-Based Outlier Detection , 2009, ECML/PKDD.

[22]  Hans-Peter Kriegel,et al.  A survey on unsupervised outlier detection in high‐dimensional numerical data , 2012, Stat. Anal. Data Min..

[23]  Sudipto Guha,et al.  Robust Random Cut Forest Based Anomaly Detection on Streams , 2016, ICML.

[24]  Xiaogang Su,et al.  Outlier detection , 2011, WIREs Data Mining Knowl. Discov..