Anomaly detectors are often used to produce a ranked list of statistical anomalies, which are examined by human analysts in order to extract the actual anomalies of interest. Unfortunately, in realworld applications, this process can be exceedingly difficult for the analyst since a large fraction of high-ranking anomalies are false positives and not interesting from the application perspective. In this paper, we aim to make the analyst's job easier by allowing for analyst feedback during the investigation process. Ideally, the feedback influences the ranking of the anomaly detector in a way that reduces the number of false positives that must be examined before discovering the anomalies of interest. In particular, we introduce a novel technique for incorporating simple binary feedback into tree-based anomaly detectors. We focus on the Isolation Forest algorithm as a representative tree-based anomaly detector, and show that we can significantly improve its performance by incorporating feedback, when compared with the baseline algorithm that does not incorporate feedback. Our technique is simple and scales well as the size of the data increases, which makes it suitable for interactive discovery of anomalies in large datasets.
[1]
Carey E. Priebe,et al.
COMPARATIVE EVALUATION OF PATTERN RECOGNITION TECHNIQUES FOR DETECTION OF MICROCALCIFICATIONS IN MAMMOGRAPHY
,
1993
.
[2]
Geoffrey E. Hinton,et al.
Visualizing Data using t-SNE
,
2008
.
[3]
Kai Ming Ting,et al.
Fast Anomaly Detection for Streaming Data
,
2011,
IJCAI.
[4]
Tomás Pevný,et al.
Loda: Lightweight on-line detector of anomalies
,
2016,
Machine Learning.
[5]
Thomas G. Dietterich,et al.
Finite Sample Complexity of Rare Pattern Anomaly Detection
,
2016,
UAI.
[6]
Philip S. Yu,et al.
RS-Forest: A Rapid Density Estimator for Streaming Anomaly Detection
,
2014,
2014 IEEE International Conference on Data Mining.
[7]
Zicheng Liu,et al.
Anomaly detection by using random projection forest
,
2015,
2015 IEEE International Conference on Image Processing (ICIP).
[8]
Thomas G. Dietterich,et al.
Systematic construction of anomaly detection benchmarks from real data
,
2013,
ODD '13.
[9]
Thomas G. Dietterich,et al.
Incorporating Expert Feedback into Active Anomaly Discovery
,
2016,
2016 IEEE 16th International Conference on Data Mining (ICDM).