A Parameter Space Framework for Online Outlier Detection Over High-Volume Data Streams

In diverse applications ranging from social networks to location-based online services to traffic monitoring, data streams are continuously monitored by multiple outlier analysts customized with different parameter settings. Real-time response to such complex outlier analytics in high-speed streaming data has been recognized as critical for many domains. In this paper, we propose a parameter space framework, called PSOD, for online outlier detection over sliding window streams to support a large variety of query requests in parameter space with both diverse pattern and window parameter settings. First, we design an ingenious neighbor table that records the neighbors for each point in different distance intervals and different slides, which enables us to maximally reuse the already acquired neighbor information across the entire parameter space. In addition, we propose a series of shared strategies in sliding window environment to minimize processing cost by eliminating the redundant query requests. Moreover, the PSOD effectively transforms the query group in 4-D parameter space into a periodic query group in 3-D parameter space to minimize the number of queries. Our experimental study on three real-world steaming data demonstrates that our PSOD successfully drives down the CPU costs by more than 100 folds compared with the state-of-the-art method.

[1]  Charu C. Aggarwal,et al.  Outlier Detection for Temporal Data: A Survey , 2014, IEEE Transactions on Knowledge and Data Engineering.

[2]  Thierry Bertin-Mahieux,et al.  The Million Song Dataset , 2011, ISMIR.

[3]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[4]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[5]  Lei Cao,et al.  Detecting moving object outliers in massive-scale trajectory streams , 2014, KDD.

[6]  Lei Cao,et al.  Scalable distance-based outlier detection over high-volume data streams , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[7]  Matthew O. Ward,et al.  A Shared Execution Strategy for Multiple Pattern Mining Requests over Streaming Data , 2009, Proc. VLDB Endow..

[8]  Dawei Liu,et al.  Efficient anomaly monitoring over moving object trajectory streams , 2009, KDD.

[9]  Jure Leskovec,et al.  Friendship and mobility: user movement in location-based social networks , 2011, KDD.

[10]  A. Madansky Identification of Outliers , 1988 .

[11]  Matthew O. Ward,et al.  Shared execution strategy for neighbor-based pattern mining requests over streaming windows , 2012, ACM Trans. Database Syst..

[12]  Kyriakos Mouratidis,et al.  Continuous Nearest Neighbor Queries over Sliding Windows , 2007, IEEE Transactions on Knowledge and Data Engineering.

[13]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[14]  T NgRaymond,et al.  Distance-based outliers: algorithms and applications , 2000, VLDB 2000.

[15]  Matthew O. Ward,et al.  Neighbor-based pattern detection for windows over streaming data , 2009, EDBT '09.

[16]  Stephen D. Bay,et al.  Mining distance-based outliers in near linear time with randomization and a simple pruning rule , 2003, KDD '03.

[17]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[18]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[19]  Fabrizio Angiulli,et al.  Distance-based outlier queries in data streams: the novel task and algorithms , 2010, Data Mining and Knowledge Discovery.

[20]  Lei Cao,et al.  Sharing-Aware Outlier Analytics over High-Volume Data Streams , 2016, SIGMOD Conference.

[21]  Lei Cao,et al.  Outlier Detection over Massive-Scale Trajectory Streams , 2017, ACM Trans. Database Syst..

[22]  Yang Gao,et al.  Truthful incentive mechanism with location privacy-preserving for mobile crowdsourcing systems , 2018, Comput. Networks.

[23]  Xing Xie,et al.  GeoLife: A Collaborative Social Networking Service among User, Location and Trajectory , 2010, IEEE Data Eng. Bull..

[24]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[25]  Yannis Manolopoulos,et al.  Continuous monitoring of distance-based outliers over data streams , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[26]  Dimitrios Gunopulos,et al.  Online outlier detection in sensor data using non-parametric models , 2006, VLDB.

[27]  Jiguo Yu,et al.  Follow But No Track: Privacy Preserved Profile Publishing in Cyber-Physical Social Systems , 2017, IEEE Internet of Things Journal.

[28]  Fabrizio Angiulli,et al.  Detecting distance-based outliers in streams of data , 2007, CIKM '07.

[29]  Yingshu Li,et al.  Collective Data-Sanitization for Preventing Sensitive Information Inference Attacks in Social Networks , 2018, IEEE Transactions on Dependable and Secure Computing.

[30]  Clara Pizzuti,et al.  Fast Outlier Detection in High Dimensional Spaces , 2002, PKDD.