Atypicity detection in data streams: A self-adjusting approach

Outlyingness is a subjective concept relying on the isolation level of a (set of) record(s). Clustering-based outlier detection is a field that aims to cluster data and to detect outliers depending on their characteristics (i.e. small, tight and/or dense clusters might be considered as outliers). Existing methods require a parameter standing for the "level of outlyingness", such as the maximum size or a percentage of small clusters, in order to build the set of outliers. Unfortunately, manually setting this parameter in a streaming environment should not be possible, given the fast time response usually needed. In this paper we propose Wod, a method that separates outliers from clusters thanks to a natural and effective principle. The main advantages of Wod are its ability to automatically adjust to any clustering result and to be parameterless.

[1]  Randy K. Young Wavelet theory and its applications , 1993, The Kluwer international series in engineering and computer science.

[2]  Sameer Singh,et al.  Novelty detection: a review - part 1: statistical approaches , 2003, Signal Process..

[3]  David Salesin,et al.  Wavelets for computer graphics: a primer.1 , 1995, IEEE Computer Graphics and Applications.

[4]  Florent Masseglia,et al.  Mining sequential patterns from data streams: a centroid approach , 2006, Journal of Intelligent Information Systems.

[5]  Abdul Hanan Abdullah,et al.  Unsupervised Anomaly Detection with Unlabeled Data Using Clustering , 2005 .

[6]  A. Hadi,et al.  BACON: blocked adaptive computationally efficient outlier nominators , 2000 .

[7]  Zengyou He,et al.  Discovering cluster-based local outliers , 2003, Pattern Recognit. Lett..

[8]  Anthony K. H. Tung,et al.  Mining top-n local outliers in large databases , 2001, KDD '01.

[9]  Shenghuo Zhu,et al.  A survey on wavelet applications in data mining , 2002, SKDD.

[10]  Aidong Zhang,et al.  WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases , 1998, VLDB.

[11]  Dong Xiang,et al.  Information-theoretic measures for anomaly detection , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[12]  Jian Pei,et al.  ApproxMAP: Approximate Mining of Consensus Sequential Patterns , 2003, SDM.

[13]  Leonid Portnoy,et al.  Intrusion detection with unlabeled data using clustering , 2000 .

[14]  Mohammed J. Zaki,et al.  ADMIT: anomaly-based data mining for intrusions , 2002, KDD.

[15]  Taghi M. Khoshgoftaar,et al.  CLUSTERING-BASED NETWORK INTRUSION DETECTION , 2007 .

[16]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[17]  LiTao,et al.  A survey on wavelet applications in data mining , 2002 .

[18]  Aidong Zhang,et al.  FindOut: Finding Outliers in Very Large Datasets , 2002, Knowledge and Information Systems.

[19]  Christopher Leckie,et al.  Adaptive Clustering for Network Intrusion Detection , 2004, PAKDD.

[20]  Shian-Shyong Tseng,et al.  Two-phase clustering process for outliers detection , 2001, Pattern Recognit. Lett..

[21]  Osmar R. Zaïane,et al.  A Nonparametric Outlier Detection for Effectively Discovering Top-N Outliers from Engineering Data , 2006, PAKDD.

[22]  I. Daubechies Ten Lectures on Wavelets , 1992 .

[23]  S YuPhilip,et al.  Outlier detection for high dimensional data , 2001 .

[24]  Charles K. Chui,et al.  An Introduction to Wavelets , 1992 .

[25]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[26]  E. J. Stollnitz,et al.  Wavelets for Computer Graphics: A Primer Part 2 , 1995 .

[27]  Takehisa Yairi,et al.  An approach to spacecraft anomaly detection problem using kernel feature space , 2005, KDD '05.

[28]  R. Kwitt,et al.  Unsupervised Anomaly Detection in Network Traffic by Means of Robust PCA , 2007, 2007 International Multi-Conference on Computing in the Global Information Technology (ICCGI'07).

[29]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[30]  Bernd Freisleben,et al.  CARDWATCH: a neural network based database mining system for credit card fraud detection , 1997, Proceedings of the IEEE/IAFE 1997 Computational Intelligence for Financial Engineering (CIFEr).

[31]  E. J. Stollnitz,et al.  Wavelets for Computer Graphics : A Primer , 1994 .

[32]  Eleazar Eskin,et al.  A GEOMETRIC FRAMEWORK FOR UNSUPERVISED ANOMALY DETECTION: DETECTING INTRUSIONS IN UNLABELED DATA , 2002 .

[33]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[34]  AgrawalRakesh,et al.  Mining association rules between sets of items in large databases , 1993 .

[35]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[36]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.