Distance based Incremental Clustering for Mining Clusters of Arbitrary Shapes

Clustering has been recognized as one of the important tasks in data mining. One important class of clustering is distance based method. To reduce the computational and storage burden of the classical clustering methods, many distance based hybrid clustering methods have been proposed. However, these methods are not suitable for cluster analysis in dynamic environment where underlying data distribution and subsequently clustering structures change over time. In this paper, we propose a distance based incremental clustering method, which can find arbitrary shaped clusters in fast changing dynamic scenarios. Our proposed method is based on recently proposed al-SL method, which can successfully be applied to large static datasets. In the incremental version of the al-SL (termed as IncrementalSL), we exploit important characteristics of al-SL method to handle frequent updates of patterns to the given dataset. The IncrementalSL method can produce exactly same clustering results as produced by the al-SL method. To show the effectiveness of the IncrementalSL in dynamically changing database, we experimented with one synthetic and one real world datasets.

[1]  John Yen,et al.  An incremental approach to building a cluster hierarchy , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[2]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[3]  Yen-Jen Oyang,et al.  An Incremental Hierarchical Data Clustering Algorithm Based on Gravity Theory , 2002, PAKDD.

[4]  M. Narasimha Murty,et al.  A hybrid clustering procedure for concentric and chain-like clusters , 1981, International Journal of Computer & Information Sciences.

[5]  Hans-Peter Kriegel,et al.  Incremental Clustering for Mining in a Data Warehousing Environment , 1998, VLDB.

[6]  Sukumar Nandi,et al.  A distance based clustering method for arbitrary shaped clusters in large datasets , 2011, Pattern Recognit..

[7]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[8]  M. Anthony. Wong A Hybrid Clustering Algorithm for Identifying High Density Clusters , 2011 .

[9]  Bidyut Kumar Patra Mining Arbitrary Shaped Clusters in Large dataset , 2011 .

[10]  John Benjafield,et al.  Cognition, 3rd ed. , 2007 .

[11]  P. A. Vijaya,et al.  Efficient bottom-up hybrid hierarchical clustering techniques for protein sequence classification , 2006, Pattern Recognit..

[12]  Mohammad Al Hasan,et al.  Under consideration for publication in Knowledge and Information Systems SPARCL: An Effective and Efficient Algorithm for Mining Arbitrary Shape-based Clusters 1 , 2022 .

[13]  Ming-Syan Chen,et al.  Combining Partitional and Hierarchical Algorithms for Robust and Efficient Data Clustering with Cohesion Self-Merging , 2005, IEEE Trans. Knowl. Data Eng..

[14]  Sergios Theodoridis,et al.  Pattern Recognition, Third Edition , 2006 .

[15]  Rajeev Motwani,et al.  Incremental Clustering and Dynamic Information Retrieval , 2004, SIAM J. Comput..

[16]  Robert F. Ling,et al.  Cluster analysis algorithms for data reduction and classification of objects , 1981 .

[17]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[18]  M. A. Wong,et al.  A Hybrid Clustering Method for Identifying High-Density Clusters , 1982 .

[19]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[20]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[21]  P. Sneath,et al.  Numerical Taxonomy , 1962, Nature.