An efficient fractal dimension based clustering algorithm

Clustering plays an important role in data mining. It helps to reveal intrinsic structure in data sets with little or no prior knowledge. The approaches of clustering have received great attention in recent years. However many published algorithms fail to do well in determining the number of cluster, finding arbitrary shapes of clusters or identifying the presence of noise. In this paper we present an efficient clustering algorithm which employs the theory of grid, density and fractal that can partition points in the same cluster with minimum change of fractal dimension meanwhile maximizing the self-similarity in the clusters. We show via experiments that FDC can quickly deal with multidimensional large data sets, identify the number of clusters, be capable of recognizing clusters of arbitrary shape and furthermore explore some qualitative information from data sets.

[1]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[2]  A. Wolf,et al.  Determining Lyapunov exponents from a time series , 1985 .

[3]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[4]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[5]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[6]  M.G.P. Prasad,et al.  An efficient fractals-based algorithm for clustering , 2003, TENCON 2003. Conference on Convergent Technologies for Asia-Pacific Region.

[7]  James Theiler,et al.  Contiguity-enhanced k-means clustering algorithm for unsupervised multispectral image segmentation , 1997, Optics & Photonics.

[8]  Oren Etzioni,et al.  Fast and Intuitive Clustering of Web Documents , 1997, KDD.

[9]  Ping Chen,et al.  Using the fractal dimension to cluster datasets , 2000, KDD '00.

[10]  Kenneth Falconer,et al.  Fractal Geometry: Mathematical Foundations and Applications , 1990 .

[11]  Michalis Vazirgiannis,et al.  Clustering algorithms and validity measures , 2001, Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001.

[12]  George Karypis,et al.  Evaluation of hierarchical clustering algorithms for document datasets , 2002, CIKM '02.

[13]  P. Grassberger Generalized dimensions of strange attractors , 1983 .

[14]  C. Sparrow The Fractal Geometry of Nature , 1984 .

[15]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[16]  L. Liebovitch,et al.  A fast algorithm to determine fractal dimensions by box counting , 1989 .