Integrated Learning Method for Anomaly Detection Combining KLSH and Isolation Principles

Aiming at the problem that the Isolated Forest (iForest) has low local anomaly detection accuracy in highdimensional and massive data sets, this paper proposes an anomaly detection method that combines locality-sensitive hashing algorithm based on Gaussian Kernel Function (KLSH) and means-optimized iForest algorithm. In this method (KLSH+iForest), the kernel function is used to map the data from the linearly indivisible data space to the linearly separable feature space, and local anomalies are converted into global anomalies. Based on above, iForest is constructed to perform anomaly detection on the Kernelized data sets. To solve the problem of how to select the optimal segmentation attributes and values for iForest, this paper proposes a mean optimization strategy. While maintaining the ability of iForest to detect global anomalies, KLSH+iForest also improves the accuracy of local anomaly detection. We compare KLSH+iForest with the LOF algorithm and the improved algorithms based on LSH on public data sets. Experimental results show that KLSH+iForest has significantly improved the accuracy and efficiency of anomaly detection in highdimensional and massive data sets.