Finding outliers at multiple scales

Outlier detection targets those exceptional data whose pattern is rare and lie in low density regions. In this paper, under the assumption of complete spatial randomness inside clusters, we propose an MDV (Multi-scale Deviation of the Volume) approach to identifying outliers. In addition to assigning an outlier score for each object, it directly outputs a crisp outlier set. It also offers a plot showing the data structure in every object's vicinity, which is useful in explaining why it may be outlying. Finally, the effectiveness of MDV is demonstrated with both artificial and real datasets.

[1]  Tom Fawcett,et al.  Adaptive Fraud Detection , 1997, Data Mining and Knowledge Discovery.

[2]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[3]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[4]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[5]  Theodore Johnson,et al.  Fast Computation of 2-Dimensional Depth Contours , 1998, KDD.

[6]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[7]  Noel A Cressie,et al.  Statistics for Spatial Data. , 1992 .

[8]  Salvatore J. Stolfo,et al.  Mining in a data-flow environment: experience in network intrusion detection , 1999, KDD '99.

[9]  Otto Nurmi,et al.  Algorithms for computational geometry , 1987 .

[10]  Noel A. C. Cressie,et al.  Statistics for Spatial Data: Cressie/Statistics , 1993 .

[11]  Teri A. Crosby,et al.  How to Detect and Handle Outliers , 1993 .

[12]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[13]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[14]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[15]  Abraham Silberschatz,et al.  What Makes Patterns Interesting in Knowledge Discovery Systems , 1996, IEEE Trans. Knowl. Data Eng..

[16]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[17]  William DuMouchel,et al.  A Fast Computer Intrusion Detection Algorithm Based on Hypothesis Testing of Command Transition Probabilities , 1998, KDD.

[18]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[19]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[20]  P. Rousseeuw,et al.  Computing depth contours of bivariate point clouds , 1996 .

[21]  Douglas M. Hawkins Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.

[22]  Wynne Hsu,et al.  Finding Interesting Patterns Using User Expectations , 1999, IEEE Trans. Knowl. Data Eng..

[23]  Prabhakar Raghavan,et al.  A Linear Method for Deviation Detection in Large Databases , 1996, KDD.

[24]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .