A new method for noise data detection based on DBSCAN and SVDD

To improve the quality of real datasets by remove noise data, a new method for noise data detection based on Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and support vector data description (SVDD) was proposed in this article. Firstly, classical DBSCAN algorithm was used to cluster the data and remove the outliers. Secondly, SVDD was used to train the grouped data according to the cluster result, and gained discriminant model for each group. All these discriminant models were used in whole dataset to classify the data. The point does not belong to any class is identified as noise data and be removed. Experimental studies are done using UCI dataset. It is shown that the method we proposed is considerably efficient.

[1]  Simon Fong,et al.  DBSCAN: Past, present and future , 2014, The Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2014).

[2]  M. Dutta,et al.  Performance Analysis of Clustering Methods for Outlier Detection , 2012, 2012 Second International Conference on Advanced Computing & Communication Technologies.

[3]  吴清锋,et al.  DSDBSCAN: A novel clustering algorithm based on double sampling for DBSCAN , 2014 .

[4]  Anish Das Sarma,et al.  Data Cleaning: A Practical Perspective , 2013, Data Cleaning: A Practical Perspective.

[5]  Mohiuddin Ahmed,et al.  A novel approach for outlier detection and clustering improvement , 2013, 2013 IEEE 8th Conference on Industrial Electronics and Applications (ICIEA).

[6]  Sirin Nitinawarat,et al.  Universal outlier detection , 2013, 2013 Information Theory and Applications Workshop (ITA).

[7]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[8]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[9]  Donghua Pan,et al.  Uncertain data cluster based on DBSCAN , 2011, 2011 International Conference on Multimedia Technology.

[10]  Huafu Chen,et al.  Two-class support vector data description , 2011, Pattern Recognit..

[11]  Chandan Srivastava,et al.  Support Vector Data Description , 2011 .

[12]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[13]  Jeen-Shing Wang,et al.  Support Vector Clustering with Outlier Detection , 2007, ICIC.

[14]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[15]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..