A comparative Study of Outlier Mining and Class Outlier Mining

Outliers can significantly affect data mining performance. Outlier mining is an important issue in knowledge discovery and data mining and has attracted increasing interests in recent years. Class outlier is promising research direction. Few researches have been done in this direction. The paper theme has two main goals: the first one is to show the significance of Class Outlier Mining by discussing a comparative study between a Class Outlier detection method called Class Outlier Distance Based (CODB) and a conventional Outlier detection method. The second goal is to introduce Enhanced Class Outlier Distance Based (ECODB) algorithm which is enhancement of CODB algorithm. ECODB reduces CODB parameters using a heuristic approach. The experimental results show that CODB can detect Class Outliers that cannot be detected using conventional Outlier detection methods. The experiments also show that ECODB works efficiently as CODB.

[1]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[2]  Ping Chen,et al.  Using the fractal dimension to cluster datasets , 2000, KDD '00.

[3]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[4]  Zengyou He,et al.  Mining class outliers: concepts, algorithms and applications in CRM , 2004, Expert Syst. Appl..

[5]  Zengyou He,et al.  Mining Class Outliers: Concepts, Algorithms and Applications , 2004, WAIM.

[6]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[7]  A. Madansky Identification of Outliers , 1988 .

[8]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[9]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[10]  Zengyou He,et al.  Outlier Detection Integrating Semantic Knowledge , 2002, WAIM.

[11]  W. R. Buckland,et al.  Outliers in Statistical Data , 1979 .

[12]  Stephen D. Bay,et al.  Mining distance-based outliers in near linear time with randomization and a simple pruning rule , 2003, KDD '03.

[13]  Raymond T. Ng,et al.  Finding Intensional Knowledge of Distance-Based Outliers , 1999, VLDB.

[14]  Theodore Johnson,et al.  Fast Computation of 2-Dimensional Depth Contours , 1998, KDD.

[15]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[16]  Ingo Mierswa,et al.  YALE: rapid prototyping for complex data mining tasks , 2006, KDD '06.

[17]  Raymond T. Ng,et al.  A Unified Notion of Outliers: Properties and Computation , 1997, KDD.

[18]  Christos Faloutsos,et al.  Cross-Outlier Detection , 2003, SSTD.

[19]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[20]  Nabil M. Hewahi,et al.  Class Outliers Mining: Distance-Based Approach , 2007 .

[21]  Hongxing He,et al.  Outlier Detection Using Replicator Neural Networks , 2002, DaWaK.