OptIForest: Optimal Isolation Forest for Anomaly Detection

Anomaly detection plays an increasingly important role in various fields for critical tasks such as intrusion detection in cybersecurity, financial risk detection, and human health monitoring. A variety of anomaly detection methods have been proposed, and a category based on the isolation forest mechanism stands out due to its simplicity, effectiveness, and efficiency, e.g., iForest is often employed as a state-of-the-art detector for real deployment. While the majority of isolation forests use the binary structure, a framework LSHiForest has demonstrated that the multi-fork isolation tree structure can lead to better detection performance. However, there is no theoretical work answering the fundamentally and practically important question on the optimal tree structure for an isolation forest with respect to the branching factor. In this paper, we establish a theory on isolation efficiency to answer the question and determine the optimal branching factor for an isolation tree. Based on the theoretical underpinning, we design a practical optimal isolation forest OptIForest incorporating clustering based learning to hash which enables more information to be learned from data for better isolation quality. The rationale of our approach relies on a better bias-variance trade-off achieved by bias reduction in OptIForest. Extensive experiments on a series of benchmarking datasets for comparative and ablation studies demonstrate that our approach can efficiently and robustly achieve better detection performance in general than the state-of-the-arts including the deep learning based methods.

[1]  Proceedings of the 2023 SIAM International Conference on Data Mining (SDM) , 2023 .

[2]  Yue Zhao,et al.  ADBench: Anomaly Detection Benchmark , 2022, NeurIPS.

[3]  Yijie Wang,et al.  Deep Isolation Forest for Anomaly Detection , 2022, IEEE Transactions on Knowledge and Data Engineering.

[4]  Charalampos E. Tsourakakis,et al.  AntiBenford Subgraphs: Unsupervised Anomaly Detection in Financial Networks , 2022, KDD.

[5]  George H. Chen,et al.  ECOD: Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions , 2022, IEEE Transactions on Knowledge and Data Engineering.

[6]  Elke A. Rundensteiner,et al.  ELITE: Robust Deep Anomaly Detection with Meta Gradient , 2021, KDD.

[7]  Sridha Sridharan,et al.  Deep Learning for Medical Anomaly Detection – A Survey , 2020, ACM Comput. Surv..

[8]  Thomas G. Dietterich,et al.  A Unifying Review of Deep and Shallow Anomaly Detection , 2020, Proceedings of the IEEE.

[9]  Xia Hu,et al.  Meta-AAD: Active Anomaly Detection with Deep Reinforcement Learning , 2020, 2020 IEEE International Conference on Data Mining (ICDM).

[10]  Chunhua Shen,et al.  Toward Deep Supervised Anomaly Detection: Reinforcement Learning from Partially Labeled Anomaly Data , 2020, KDD.

[11]  Chunhua Shen,et al.  Unsupervised Representation Learning by Predicting Random Distances , 2019, IJCAI.

[12]  Anton van den Hengel,et al.  Deep Anomaly Detection with Deviation Networks , 2019, KDD.

[13]  Walid G. Aref,et al.  2018 IEEE International Conference on Data Mining (ICDM) , 2018 .

[14]  Meng Wang,et al.  Generative Adversarial Active Learning for Unsupervised Outlier Detection , 2018, IEEE Transactions on Knowledge and Data Engineering.

[15]  Xiaodong Wang,et al.  Real-Time Nonparametric Anomaly Detection in High-Dimensional Settings , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Ling Chen,et al.  Learning Representations of Ultrahigh-dimensional Data for Random Distance-based Outlier Detection , 2018, KDD.

[17]  Bo Zong,et al.  Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection , 2018, ICLR.

[18]  Qiang He,et al.  LSHiForest: A Generic Framework for Fast Tree Isolation Based Ensemble Anomaly Analysis , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[19]  Nicu Sebe,et al.  A Survey on Learning to Hash , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Kai Ming Ting,et al.  Efficient Anomaly Detection by Isolation Using Nearest Neighbour Ensemble , 2014, 2014 IEEE International Conference on Data Mining Workshop.

[21]  Arthur Zimek,et al.  Subsampling for efficient and effective unsupervised outlier detection ensembles , 2013, KDD.

[22]  Fei Tony Liu,et al.  Isolation-Based Anomaly Detection , 2012, TKDD.

[23]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[24]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[25]  Jiawei Han,et al.  ACM Transactions on Knowledge Discovery from Data: Introduction , 2007 .

[26]  Mayank Bawa,et al.  LSH forest: self-tuning indexes for similarity search , 2005, WWW '05.

[27]  Benjamin W. Wah,et al.  Editorial: Two Named to Editorial Board of IEEE Transactions on Knowledge and Data Engineering , 1996 .

[28]  K. Russell Estimating the Value of e by Simulation , 1991 .

[29]  Jose M. Such,et al.  International Joint Conference on Artificial Intelligence (IJCAI) , 2016 .

[30]  Mohiuddin Ahmed,et al.  A survey of network anomaly detection techniques , 2016, J. Netw. Comput. Appl..

[31]  Nello Cristianini,et al.  Machine Learning and Knowledge Discovery in Databases (ECML PKDD) , 2010 .

[32]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[33]  Sanjay Jha,et al.  DIMY: Enabling privacy-preserving contact tracing , 2021, Journal of Network and Computer Applications.