Attribute Outlier Detection over Data Streams

Outlier detection is widely used in many data stream application, such as network intrusion detection, fraud detection, etc. However, most existing algorithms focused on detecting class outliers and there is little work on detecting attribute outliers, which considers the correlation or relevance among the data items. In this paper we study the problem of detecting attribute outliers within the sliding windows over data streams. An efficient algorithm is proposed to perform exact outlier detection. The algorithm relies on an efficient data structure, which stores only the necessary information and can perform updates incurred by data arrival and expiration with minimum cost. To address the problem of limited memory, we also present an approximate algorithm, which selectively drops data within the current window and at the same time maintains a maximum error bound. Extensive experiments are conducted and the results show that our algorithms are efficient and effective.

[1]  Mong-Li Lee,et al.  Correlation-based Attribute Outlier Detection in XML , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[2]  Xingquan Zhu,et al.  Class Noise vs. Attribute Noise: A Quantitative Study , 2003, Artificial Intelligence Review.

[3]  Aoying Zhou,et al.  Tracking clusters in evolving data streams over sliding windows , 2008, Knowledge and Information Systems.

[4]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[5]  Raymond T. Ng,et al.  A Unified Notion of Outliers: Properties and Computation , 1997, KDD.

[6]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[7]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[8]  Fabrizio Angiulli,et al.  Detecting distance-based outliers in streams of data , 2007, CIKM '07.

[9]  Douglas M. Hawkins Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.

[10]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[11]  Mukesh K. Mohania,et al.  Advances in Databases: Concepts, Systems and Applications , 2007 .

[12]  Shian-Shyong Tseng,et al.  Two-phase clustering process for outliers detection , 2001, Pattern Recognit. Lett..

[13]  Ji Zhang,et al.  SPOT: A System for Detecting Projected Outliers From High-dimensional Data Streams , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[14]  Mong-Li Lee,et al.  Correlation-Based Detection of Attribute Outliers , 2007, DASFAA.

[15]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[16]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.