Isolation forests: looking beyond tree depth

The isolation forest algorithm for outlier detection exploits a simple yet effective observation: if taking some multivariate data and making uniformly random cuts across the feature space recursively, it will take fewer such random cuts for an outlier to be left alone in a given subspace as compared to regular observations. The original idea proposed an outlier score based on the tree depth (number of random cuts) required for isolation, but experiments here show that using information about the size of the feature space taken and the number of points assigned to it can result in improved results in many situations without any modification to the tree structure, especially in the presence of categorical features.

[1]  Sebastian Buschjäger,et al.  Randomized outlier detection with trees , 2020, International Journal of Data Science and Analytics.

[2]  Zhi-Hua Zhou,et al.  On Detecting Clustered Anomalies Using SCiForest , 2010, ECML/PKDD.

[3]  Arthur Zimek,et al.  On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study , 2016, Data Mining and Knowledge Discovery.

[4]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[5]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[6]  Sudipto Guha,et al.  Robust Random Cut Forest Based Anomaly Detection on Streams , 2016, ICML.

[7]  David Cortes,et al.  Revisiting randomized choices in isolation forests , 2021, ArXiv.

[8]  Zhi-Hua Zhou,et al.  Isolation Distributional Kernel: A New Tool for Point & Group Anomaly Detection , 2020, ArXiv.

[9]  Maël Chiapino,et al.  One Class Splitting Criteria for Random Forests , 2016, ACML.

[10]  Qiang He,et al.  LSHiForest: A Generic Framework for Fast Tree Isolation Based Ensemble Anomaly Analysis , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[11]  Robert J. Brunner,et al.  Extended Isolation Forest , 2018, IEEE Transactions on Knowledge and Data Engineering.

[12]  Natalie Klein Density Estimation Trees , 2015 .