Detecting anomalies in streaming data is an important issue for many application domains, such as cybersecurity, natural disasters, or bank frauds. Different approaches have been designed in order to detect anomalies: statistics-based, isolation-based, clustering-based, etc. In this paper, we present a structured survey of the existing anomaly detection methods for data streams with a deep view on Isolation Forest (iForest). We first provide an implementation of Isolation Forest Anomalies detection in Stream Data (IForestASD), a variant of iForest for data streams. This implementation is built on top of scikit-multiflow (River), which is an open source machine learning framework for data streams containing a single anomaly detection algorithm in data streams, called Streaming half-space trees. We performed experiments on different real and well known data sets in order to compare the performance of our implementation of IForestASD and half-space trees. Moreover, we extended the IForestASD algorithm to handle drifting data by proposing three algorithms that involve two main well known drift detection methods: ADWIN and KSWIN. ADWIN is an adaptive sliding window algorithm for detecting change in a data stream. KSWIN is a more recent method and it refers to the Kolmogorov–Smirnov Windowing method for concept drift detection. More precisely, we extended KSWIN to be able to deal with n-dimensional data streams. We validated and compared all of the proposed methods on both real and synthetic data sets. In particular, we evaluated the F1-score, the execution time, and the memory consumption. The experiments show that our extensions have lower resource consumption than the original version of IForestASD with a similar or better detection efficiency.
[1]
Talel Abdessalem,et al.
Scikit-Multiflow: A Multi-output Streaming Framework
,
2018,
J. Mach. Learn. Res..
[2]
Stéphan Clémençon,et al.
Functional Isolation Forest
,
2019,
ACML.
[3]
Frank-Michael Schleif,et al.
Reactive Soft Prototype Computing for Concept Drift Streams
,
2020,
Neurocomputing.
[4]
Geoff Holmes,et al.
MOA: Massive Online Analysis
,
2010,
J. Mach. Learn. Res..
[5]
David Cortes.
Distance approximation using Isolation Forests
,
2019,
ArXiv.
[6]
Willie Ng,et al.
Discovery of Frequent Patterns in Transactional Data Streams
,
2010,
Trans. Large Scale Data Knowl. Centered Syst..
[7]
Ira Assent,et al.
AnyOut: Anytime Outlier Detection on Streaming Data
,
2012,
DASFAA.
[8]
Philip S. Yu,et al.
A Survey of Synopsis Construction in Data Streams
,
2007,
Data Streams - Models and Algorithms.
[9]
Pierre-François Marteau,et al.
Hybrid Isolation Forest - Application to Intrusion Detection
,
2017,
ArXiv.
[10]
Raja Chiky,et al.
Etude comparative des méthodes de détection d'anomalies
,
2020,
EGC.