论文信息 - Scalable Multi-Parameter Outlier Detection Technology

Scalable Multi-Parameter Outlier Detection Technology

The real-time detection of anomalous phenomena on streaming data has become increasingly important for applications ranging from fraud detection, financial analysis to traffic management. In these streaming applications, often a large number of similar continuous outlier detection queries are executed concurrently. In the light of the high algorithmic complexity of detecting and maintaining outlier patterns for different parameter settings independently, we propose a shared execution methodology called SOP that handles a large batch of requests with diverse pattern configurations. First, our systematic analysis reveals opportunities for maximum resource sharing by leveraging commonalities among outlier detection queries. For that, we introduce a sharing strategy that integrates all computation results into one compact data structure. It leverages temporal relationships among stream data points to prioritize the probing process. Second, this work is the first to consider predicate constraints in the outlier detection context. By distinguishing between target and scope constraints, customized fragment sharing and block selection strategies can be effectively applied to maximize the efficiency of system resource utilization. Our experimental studies utilizing real stream data demonstrate that our approach performs 3 orders of magnitude faster than the startof-the-art and scales to 1000s of queries.

Jiayuan Wang

[1] Michael J. Franklin,et al. On-the-fly sharing for streamed aggregation , 2006, SIGMOD Conference.

[2] Walid G. Aref,et al. Scheduling for shared window joins over data streams , 2003, VLDB.

[3] Aoying Zhou,et al. Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[4] Lei Cao,et al. Scalable distance-based outlier detection over high-volume data streams , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[5] Sridhar Ramaswamy,et al. Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[6] Jennifer Widom,et al. Resource Sharing in Continuous Sliding-Window Aggregates , 2004, VLDB.

[7] Douglas M. Hawkins. Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.

[8] David Maier,et al. No pane, no gain: efficient evaluation of sliding-window aggregates over data streams , 2005, SGMD.

[9] Matthew O. Ward,et al. Neighbor-based pattern detection for windows over streaming data , 2009, EDBT '09.

[10] Jennifer Widom,et al. The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[11] Li Tu,et al. Density-based clustering for real-time stream data , 2007, KDD '07.