A Multi-resolution Approach for Atypical Behaviour Mining

Atypical behaviours are the basis of a valuable knowledge in domains related to security (e.g. fraud detection for credit card [1], cyber security [4] or safety of critical systems [6]). Atypicity generally depends on the isolation level of a (set of) records, compared to the dataset. One possible method for finding atypic records aims to perform two steps. The first step is a clustering (grouping the records by similarity) and the second step is the identification of clusters that do not correspond to a satisfying number of records. The main problem is to adjust the method and find the good level of atypicity. This issue is even more important in the domain of data streams, where a decision has to be taken in a very short time and the end-user does not want to try several settings. In this paper, we propose Mrab , a self-adjusting approach intending to automatically discover atypical behaviours (in the results of a clustering algorithm) without any parameter. We provide the formal framework of our method and our proposal is tested through a set of experiments.

[1]  Yelena Yesha,et al.  Data Mining: Next Generation Challenges and Future Directions , 2004 .

[2]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[3]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[4]  Leonid Portnoy,et al.  Intrusion detection with unlabeled data using clustering , 2000 .

[5]  Shian-Shyong Tseng,et al.  Two-phase clustering process for outliers detection , 2001, Pattern Recognit. Lett..

[6]  Osmar R. Zaïane,et al.  A Nonparametric Outlier Detection for Effectively Discovering Top-N Outliers from Engineering Data , 2006, PAKDD.

[7]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[8]  Bernd Freisleben,et al.  CARDWATCH: a neural network based database mining system for credit card fraud detection , 1997, Proceedings of the IEEE/IAFE 1997 Computational Intelligence for Financial Engineering (CIFEr).

[9]  Mohammed J. Zaki,et al.  ADMIT: anomaly-based data mining for intrusions , 2002, KDD.

[10]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[11]  Christopher Leckie,et al.  Adaptive Clustering for Network Intrusion Detection , 2004, PAKDD.

[12]  Randy K. Young Wavelet theory and its applications , 1993, The Kluwer international series in engineering and computer science.

[13]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[14]  Taghi M. Khoshgoftaar,et al.  CLUSTERING-BASED NETWORK INTRUSION DETECTION , 2007 .

[15]  Anthony K. H. Tung,et al.  Mining top-n local outliers in large databases , 2001, KDD '01.

[16]  Takehisa Yairi,et al.  An approach to spacecraft anomaly detection problem using kernel feature space , 2005, KDD '05.