Fast Memory Efficient Local Outlier Detection in Data Streams (Extended Abstract)

Outlier detection is an important task in data mining. With the growing need to analyze high speed data streams, the task of outlier detection becomes even more challenging as traditional outlier detection techniques can no longer assume that all the data can be stored for processing. While the wellknown Local Outlier Factor (LOF) algorithm has an incremental version (called iLOF), it assumes unbounded memory to keep all previous data points. In this paper, we propose a memory efficient incremental local outlier (MiLOF) detection algorithm for data streams, and a more flexible version (MiLOF F), both have an accuracy close to iLOF but within a fixed memory bound. In addition MiLOF F is robust to changes in the number of data points, underlying clusters and dimensions in the data stream.

[1]  Le Gruenwald,et al.  Research issues in outlier detection for data streams , 2014, SKDD.

[2]  Yannis Manolopoulos,et al.  Continuous monitoring of distance-based outliers over data streams , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[3]  Fabrizio Angiulli,et al.  Detecting distance-based outliers in streams of data , 2007, CIKM '07.

[4]  Charu C. Aggarwal A segment-based framework for modeling and mining data streams , 2010, Knowledge and Information Systems.

[5]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[6]  Graham J. Williams,et al.  On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms , 2000, KDD '00.

[7]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[8]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[9]  Vipin Kumar,et al.  Feature bagging for outlier detection , 2005, KDD '05.

[10]  Mahsa Salehi,et al.  Fast Memory Efficient Local Outlier Detection in Data Streams , 2017, IEEE Transactions on Knowledge and Data Engineering.

[11]  Anthony K. H. Tung,et al.  Mining top-n local outliers in large databases , 2001, KDD '01.

[12]  Clara Pizzuti,et al.  Fast Outlier Detection in High Dimensional Spaces , 2002, PKDD.

[13]  Eamonn J. Keogh,et al.  Data Editing Techniques to Allow the Application of Distance-Based Outlier Detection to Streams , 2010, 2010 IEEE International Conference on Data Mining.

[14]  Marimuthu Palaniswami,et al.  Streaming analysis in wireless sensor networks , 2014, Wirel. Commun. Mob. Comput..

[15]  Hans-Peter Kriegel,et al.  Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection , 2012, Data Mining and Knowledge Discovery.

[16]  Marimuthu Palaniswami,et al.  Evolving Fuzzy Rules for Anomaly Detection in Data Streams , 2015, IEEE Transactions on Fuzzy Systems.

[17]  Miao Xie,et al.  Anomaly Detection in Wireless Sensor Networks , 2013 .

[18]  Zhe Wang,et al.  Modeling LSH for performance tuning , 2008, CIKM '08.

[19]  Anthony K. H. Tung,et al.  Ranking Outliers Using Symmetric Neighborhood Relationship , 2006, PAKDD.

[20]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[21]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[22]  Hans-Peter Kriegel,et al.  On Evaluation of Outlier Rankings and Outlier Scores , 2012, SDM.

[23]  Kenji Yamanishi,et al.  A unifying framework for detecting outliers and change points from non-stationary time series data , 2002, KDD.

[24]  Arthur Zimek,et al.  Subsampling for efficient and effective unsupervised outlier detection ensembles , 2013, KDD.

[25]  Mahsa Salehi,et al.  Local outlier detection for data streams in sensor networks: Revisiting the utility problem invited paper , 2015, 2015 IEEE Tenth International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP).

[26]  Kun Li,et al.  Efficient Clustering-Based Outlier Detection Algorithm for Dynamic Data Stream , 2008, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery.

[27]  Charu C. Aggarwal,et al.  Outlier Detection for Temporal Data: A Survey , 2014, IEEE Transactions on Knowledge and Data Engineering.

[28]  Ira Assent,et al.  AnyOut: Anytime Outlier Detection on Streaming Data , 2012, DASFAA.

[29]  Charu C. Aggarwal,et al.  Data Clustering: Algorithms and Applications , 2014 .

[30]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[31]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[32]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[33]  Lei Cao,et al.  Scalable distance-based outlier detection over high-volume data streams , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[34]  Matthew O. Ward,et al.  Neighbor-based pattern detection for windows over streaming data , 2009, EDBT '09.

[35]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[36]  Hongjun Lu,et al.  Finding centric local outliers in categorical/numerical spaces , 2006, Knowledge and Information Systems.

[37]  Philip S. Yu,et al.  A Framework for Projected Clustering of High Dimensional Data Streams , 2004, VLDB.

[38]  Robert P. W. Duin,et al.  Prototype selection for dissimilarity-based classifiers , 2006, Pattern Recognit..

[39]  Mahsa Salehi,et al.  A Relevance Weighted Ensemble Model for Anomaly Detection in Switching Data Streams , 2014, PAKDD.

[40]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[41]  Hans-Peter Kriegel,et al.  Interpreting and Unifying Outlier Scores , 2011, SDM.

[42]  Sanjay Chawla,et al.  On local spatial outliers , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[43]  Hans-Peter Kriegel,et al.  LoOP: local outlier probabilities , 2009, CIKM.

[44]  Qinbao Song,et al.  A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[45]  Aleksandar Lazarevic,et al.  Incremental Local Outlier Detection for Data Streams , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[46]  Ira Assent,et al.  Local Outlier Detection with Interpretation , 2013, ECML/PKDD.

[47]  Sanjay Ghemawat,et al.  MapReduce: a flexible data processing tool , 2010, CACM.

[48]  Jian Tang,et al.  Enhancing Effectiveness of Outlier Detections for Low Density Patterns , 2002, PAKDD.