Characteristics of a Hierarchical Data Clustering Algorithm Based on Gravity Theory

Clustering algorithms that output a hierarchical dendrogram as return are classified as hierarchical clustering algorithms. The most desirable feature of the hierarchical clustering algorithm is that a hierarchical dendrogram is generated. This feature is very important for applications such as in biological, social, and behavior studies, due to the need to construct taxonomies. One general problem of the modern hierarchical data clustering algorithms is that clustering quality highly depends on how certain parameters are set. What makes the situation even more complicated is that optimal parameter setting is data dependent. As a result, it may happen that different parts of a given data set require different parameter settings for optimizing clustering quality and applying a global parameter setting to the entire data set may ruin the final result. In such cases, parameter tuning may require human intervention, which not only is time consuming but also may become cumbersome to the user, if the dimension of the data set is high. This paper presents the main characteristics of a hierarchical clustering algorithm that overcomes the parameter-tuning problem and features favorite clustering quality. The proposed hierarchical clustering algorithm is based on gravity theory in physics. The studies presented in this paper reveal that the optimal ranges for the parameters to be set in the proposed gravity-based clustering algorithm are wide and are essentially not data dependent. Therefore, parameter tuning is essentially not required. Another major feature of the proposed gravity-based algorithm is that it enjoys favorite clustering quality in comparison with conventional hierarchical clustering algorithms that require no parameter tuning. Keyword: data clustering, hierarchical clustering, clustering quality, gravity theory. Section

[1]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[2]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[3]  M. Narasimha Murty,et al.  A divisive scheme for constructing minimal spanning trees in coordinate space , 1990, Pattern Recognit. Lett..

[4]  Takio Kurita,et al.  An efficient agglomerative clustering algorithm using a heap , 1991, Pattern Recognit..

[5]  Michael Stonebraker,et al.  The SEQUOIA 2000 storage benchmark , 1993, SIGMOD '93.

[6]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[7]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[8]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[9]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[10]  Christos Levcopoulos,et al.  Fast Algorithms for Complete Linkage Clustering , 1998, Discret. Comput. Geom..

[11]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[12]  David Eppstein,et al.  Fast hierarchical clustering and other applications of dynamic closest pairs , 1999, SODA '98.

[13]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[14]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[15]  Hans-Peter Kriegel,et al.  A distribution-based clustering algorithm for mining in large spatial databases , 1998, Proceedings 14th International Conference on Data Engineering.

[16]  George Karypis,et al.  C HAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling , 1999 .

[17]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[18]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[19]  Ickjai Lee,et al.  AUTOCLUST: Automatic Clustering via Boundary Extraction for Mining Massive Point-Data Sets , 2000 .

[20]  Yen-Jen Oyang,et al.  A Study on the Hierarchical Data Clustering Algorithm Based on Gravity Theory , 2001, PKDD.

[21]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..