A Comparative Study for Outlier Detection Techniques in Data Mining

Existing studies in data mining mostly focus on finding patterns in large datasets and further using it for organizational decision making. However, finding such exceptions and outliers has not yet received as much attention in the data mining field as some other topics have, such as association rules, classification and clustering. Thus, this paper describes the performance of control chart, linear regression, and Manhattan distance techniques for outlier detection in data mining. Experimental studies show that outlier detection technique using control chart is better than the technique modeled from linear regression because the number of outlier data detected by control chart is smaller than linear regression. Further, experimental studies shows that Manhattan distance technique outperformed compared with the other techniques when the threshold values increased

[1]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[2]  Aidong Zhang,et al.  FindOut: Finding Outliers in Very Large Datasets , 2002, Knowledge and Information Systems.

[3]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[4]  Shian-Shyong Tseng,et al.  Two-phase clustering process for outliers detection , 2001, Pattern Recognit. Lett..

[5]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[6]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[7]  Philip S. Yu,et al.  An effective and efficient algorithm for high-dimensional outlier detection , 2005, The VLDB Journal.

[8]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[9]  Hongxing He,et al.  A comparative study of RNN for outlier detection in data mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..