A dynamic approach for clustering data

Abstract This paper introduces a new method for clustering data using a dynamic scheme. An appropriate partitioning is obtained based on both a dissimilarity measure between pairs of entities as well as a dynamic procedure of splitting. A dissimilarity function is defined by using the cost of the optimum path from a datum to each entity on a graph, with the cost of a path being defined as the greatest distance between two successive vertices on the path. The procedure of clustering is dynamic in the sense that the initial problem of determining a partition into an unknown number of natural groupings has been reduced to a sequence of only two class splitting stages. Having arisen from any particular application, the proposed approach could be effective for many domains, and it is especially successful to identify clusters if there is lack of prior knowledge about the data set. The usefulness of the dynamic algorithm to deal with elongated or non-piecewise linear separable clusters as well as sparse and dense groupings is demonstrated with several data sets.

[1]  Anil K. Jain,et al.  Clustering Methodologies in Exploratory Data Analysis , 1980, Adv. Comput..

[2]  David B. Cooper,et al.  Bayesian Clustering for Unsupervised Estimation of Surface and Texture Models , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  David W. Capson,et al.  Selection of partitions from a hierarchy , 1993, Pattern Recognit. Lett..

[4]  G. W. Milligan,et al.  The Effect of Cluster Size, Dimensionality, and the Number of Clusters on Recovery of True Cluster Structure , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[6]  Paul L. Rosin Representing curves at their natural scales , 1992, Pattern Recognit..

[7]  Jose A. García,et al.  Boundary simplification in cartography preserving the characteristics of the shape features , 1994 .

[8]  Farzin Mokhtarian,et al.  Scale-Based Description and Recognition of Planar Curves and Two-Dimensional Shapes , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  L. Hubert Some applications of graph theory to clustering , 1974 .

[10]  Donald E. Brown,et al.  Clustering of homogeneous subsets , 1991, Pattern Recognition Letters.

[11]  Paul Y. S. Cheung,et al.  Clustering of clusters , 1992, Pattern Recognit..

[12]  Alexander Toet Hierarchical clustering through morphological graph transformation , 1991, Pattern Recognit. Lett..

[13]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[14]  Qiwen Zhang,et al.  A clustering algorithm for data-sets with a large number of classes , 1991, Pattern Recognit..

[15]  Joaquín Fernández-Valdivia,et al.  Representing planar curves by using a scale vector , 1994, Pattern Recognit. Lett..

[16]  Y. Dodge on Statistical data analysis based on the L1-norm and related methods , 1987 .

[17]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[18]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[19]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[20]  J. Gower,et al.  Methods for statistical data analysis of multivariate observations , 1977, A Wiley publication in applied statistics.

[21]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[22]  Y. Chien,et al.  Pattern classification and scene analysis , 1974 .

[23]  Ling-Hwei Chen,et al.  A new non-iterative approach for clustering , 1994, Pattern Recognit. Lett..