The role of visualization in effective data cleaning

Using visualization techniques to assist conventional data mining tasks has attracted considerable interest in recent years. This paper addresses a challenging issue in the use of visualization for data mining: choosing appropriate parameters for spatial data cleaning methods. On one hand, algorithm performance is improved through visualization. On the other hand, characteristics and properties of methods and features of data are visualized as feedbacks to the user. A 3-D visualization model, called Waterfall, is proposed to assist spatial data cleaning in four important aspects: dimension-independent data visualization, visualization of data quality, algorithm parameter selection, and measurement of noise removing methods on parameter sensitiveness.

[1]  David Harel,et al.  Clustering spatial data using random walks , 2001, KDD '01.

[2]  Vladimir Batagelj,et al.  Partitioning Approach to Visualization of Large Graphs , 1999, GD.

[3]  Yu Qian,et al.  Discovering spatial patterns accurately with effective noise removal , 2004, DMKD '04.

[4]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[5]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[6]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[7]  Thorsten von Eicken,et al.  技術解説 IEEE Computer , 1999 .

[8]  Weili Wu,et al.  Spatial contextual classification and prediction models for mining geospatial data , 2002, IEEE Trans. Multim..

[9]  Anthony K. H. Tung,et al.  Spatial clustering methods in data mining : A survey , 2001 .

[10]  Kang Zhang,et al.  FAÇADE: a fast and effective approach to the discovery of dense clusters in noisy spatial data , 2004, SIGMOD '04.

[11]  Ramasamy Uthurusamy,et al.  Evolving data into mining solutions for insights , 2002, CACM.

[12]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[13]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[14]  Stephen B. Seidman,et al.  Network structure and minimum degree , 1983 .

[15]  Ramasamy Uthurusamy,et al.  EVOLVING DATA MINING INTO SOLUTIONS FOR INSIGHTS , 2002 .

[16]  Junwen ZhengComputer Visualization of Spatio-temporal Data Quality Quality of Geographic Data , 2007 .

[17]  Download Book,et al.  Information Visualization in Data Mining and Knowledge Discovery , 2001 .

[18]  Vipin Kumar,et al.  Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data , 2003, SDM.