Cluster-based outlier detection

Outlier detection has important applications in the field of data mining, such as fraud detection, customer behavior analysis, and intrusion detection. Outlier detection is the process of detecting the data objects which are grossly different from or inconsistent with the remaining set of data. Outliers are traditionally considered as single points; however, there is a key observation that many abnormal events have both temporal and spatial locality, which might form small clusters that also need to be deemed as outliers. In other words, not only a single point but also a small cluster can probably be an outlier. In this paper, we present a new definition for outliers: cluster-based outlier, which is meaningful and provides importance to the local data behavior, and how to detect outliers by the clustering algorithm LDBSCAN (Duan et al. in Inf. Syst. 32(7):978–986, 2007) which is capable of finding clusters and assigning LOF (Breunig et al. in Proceedings of the 2000 ACM SIG MOD International Conference on Manegement of Data, ACM Press, pp. 93–104, 2000) to single points.

[1]  Aidong Zhang,et al.  WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases , 1998, VLDB.

[2]  Raymond T. Ng,et al.  Finding Intensional Knowledge of Distance-Based Outliers , 1999, VLDB.

[3]  Lida Xu,et al.  Feature space theory - a mathematical foundation for data mining , 2001, Knowl. Based Syst..

[4]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[5]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[6]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[7]  Lida Xu,et al.  An Integrated Approach for Agricultural Ecosystem Management , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[8]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[9]  Shian-Shyong Tseng,et al.  Two-phase clustering process for outliers detection , 2001, Pattern Recognit. Lett..

[10]  Helder Gomes Costa,et al.  Application of an integrated decision support process for supplier selection , 2007, Enterp. Inf. Syst..

[11]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[12]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[13]  Mei Zhang,et al.  A rough set approach to knowledge reduction based on inclusion degree and evidence reasoning theory , 2003, Expert Syst. J. Knowl. Eng..

[14]  Qing He,et al.  MSMiner - a developing platform for OLAP , 2007, Decis. Support Syst..

[15]  Cheng Hsu,et al.  An industrial network flow information integration model for supply chain management and intelligent transportation , 2007, Enterp. Inf. Syst..

[16]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[17]  Wen-Xiu Zhang,et al.  A knowledge processing method for intelligent systems based on inclusion degree , 2003, Expert Syst. J. Knowl. Eng..

[18]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[19]  Lida Xu,et al.  Feature space theory in data mining: transformations between extensions and intensions in knowledge representation , 2003, Expert Syst. J. Knowl. Eng..

[20]  W. R. Buckland,et al.  Outliers in Statistical Data , 1979 .

[21]  Zengyou He,et al.  Discovering cluster-based local outliers , 2003, Pattern Recognit. Lett..

[22]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[23]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[24]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[25]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[26]  Lian Duan,et al.  A Local Density Based Spatial Clustering Algorithm with Noise , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[27]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[28]  Li Xu Advances in intelligent information processing , 2006, Expert Syst. J. Knowl. Eng..

[29]  A. Madansky Identification of Outliers , 1988 .

[30]  Jean-Paul Jamont,et al.  Flood decision support system on agent grid: method and implementation , 2007, Enterp. Inf. Syst..

[31]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[32]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[33]  Theodore Johnson,et al.  Fast Computation of 2-Dimensional Depth Contours , 1998, KDD.

[34]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1996, SIGMETRICS '96.