Linear correlation analysis of numeric attributes for government data

To analyze the linear correlations of numeric attributes of government data, this paper proposes a method based on the clustering algorithm. A clustering method is adopted to prune outliers and the linear correlation analysis is performed for each cluster, instead for the whole dataset. In this way, the method can obtain multiple correlations between the same two attributes. The paper presents the experiment on the government social security data. Experimental results show that the proposed method is much better than the traditional regression analysis and association rule analysis.

[1]  Sudipto Guha,et al.  ROCK: A Robust Clustering Algorithm for Categorical Attributes , 2000, Inf. Syst..

[2]  Marc Levoy,et al.  QSplat: a multiresolution point rendering system for large meshes , 2000, SIGGRAPH.

[3]  Bernd Hamann,et al.  Segmenting Point Sets , 2006, IEEE International Conference on Shape Modeling and Applications 2006 (SMI'06).

[4]  Jiawei Han,et al.  Metarule-Guided Mining of Multi-Dimensional Association Rules Using Data Cubes , 1997, KDD.

[5]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[6]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[7]  Leif Kobbelt,et al.  High-quality point-based rendering on modern GPUs , 2003, 11th Pacific Conference onComputer Graphics and Applications, 2003. Proceedings..

[8]  Joonki Paik,et al.  Simple and efficient algorithm for part decomposition of 3-D triangulated models based on curvature analysis , 2002, Proceedings. International Conference on Image Processing.

[9]  Renato Pajarola,et al.  Point-based rendering techniques , 2004, Comput. Graph..

[10]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[11]  Ayellet Tal,et al.  Hierarchical mesh decomposition using fuzzy clustering and cuts , 2003, ACM Trans. Graph..

[12]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[13]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .