A hierarchical clustering algorithm based on fuzzy graph connectedness

Many clustering methods have been proposed in the area of data mining, but only few of them focused on the incremental databases. In this paper, an algorithm for hierarchical clustering based on fuzzy graph connectedness algorithm (FHC) is investigated. The presented algorithm applies fuzzy set theory to hierarchical clustering method so as to discover clusters with arbitrary shape. It first partitions the data sets into several sub-clusters using a partitioning method, and then constructs a fuzzy graph of sub-clusters by analyzing the fuzzy-connectedness degree among sub-clusters. By computing the λ cut graph, the connected components of the fuzzy graph can be obtained, hence resulting the desired clustering. The algorithm can be performed in high-dimensional data sets, finding clusters of arbitrary shapes such as the spherical, linear, elongated or concave ones. Also rendered in this research is the incremental algorithm-IFHC applicable to periodically incremental environments. Not only can FHC and IFHC handle data with numerical attributes, but categorical attributes can be dealt with as well. The results of our experimental study for data sets with arbitrary shape and size are very encouraging. The experimental study in web log files is also conducted that can help discover the user access patterns efficiently. The investigation demonstrates that the proposed method generates better quality clusters than traditional algorithms, and scales up well for large databases.

[1]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[2]  Rajesh N. Davé,et al.  Adaptive fuzzy c-shells clustering and detection of ellipses , 1992, IEEE Trans. Neural Networks.

[3]  Chunhua Ju,et al.  Reorganizing web sites based on user access patterns , 2002, Intell. Syst. Account. Finance Manag..

[4]  Feng Yu Incremental Updating Algorithms for Mining Association Rules , 1998 .

[5]  Jiawei Han,et al.  Maintenance of discovered association rules in large databases: an incremental updating technique , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[6]  W. T. Tucker,et al.  Convergence theory for fuzzy c-means: Counterexamples and repairs , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[7]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[8]  Hans-Peter Kriegel,et al.  Incremental Clustering for Mining in a Data Warehousing Environment , 1998, VLDB.

[9]  Enrique H. Ruspini,et al.  A New Approach to Clustering , 1969, Inf. Control..

[10]  Yongjian Fu,et al.  Adaptive Web Sites by Web Usage Mining , 2001, International Conference on Internet Computing.

[11]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[12]  Rajeev Motwani,et al.  Approximation Algorithms for Clustering Streams and Large Data Sets , 2003 .

[13]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[14]  Tadeusz Morzy,et al.  Scalable Hierarchical Clustering Method for Sequences of Categorical Values , 2001, PAKDD.

[15]  T. Pavlidis,et al.  Fuzzy sets and their applications to cognitive and decision processes , 1977 .

[16]  Jiawei Han,et al.  Attribute-Oriented Induction in Relational Databases , 1991, Knowledge Discovery in Databases.

[17]  Chen Ning An Incremental Grid Density-Based Clustering Algorithm , 2002 .

[18]  David Wai-Lok Cheung,et al.  A General Incremental Technique for Maintaining Discovered Association Rules , 1997, DASFAA.

[19]  D. Cheung,et al.  Maintenance of Discovered Association Rules: When to update? , 1997, DMKD.

[20]  Jiawei Han,et al.  IncSpan: incremental mining of sequential patterns in large database , 2004, KDD.

[21]  Chunhua Ju,et al.  Reorganizing web sites based on user access patterns , 2001, CIKM '01.

[22]  James C. Bezdek,et al.  Analysis of fuzzy information , 1987 .

[23]  Li Xiao,et al.  A Chinese Web Page Classifier Based on Support Vector Machine and Unsupervised Clustering , 2001 .

[24]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .