Clustering Algorithms Research

The research actuality and new progress in clustering algorithm in recent years are summarized in this paper. First, the analysis and induction of some representative clustering algorithms have been made from several aspects, such as the ideas of algorithm, key technology, advantage and disadvantage. On the other hand, several typical clustering algorithms and known data sets are selected, simulation experiments are implemented from both sides of accuracy and running efficiency, and clustering condition of one algorithm with different data sets is analyzed by comparing with the same clustering of the data set under different algorithms. Finally, the research hotspot, difficulty, shortage of the data clustering and some pending problems are addressed by the integration of the aforementioned two aspects information. The above work can give a valuable reference for data clustering and data mining.

[1]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Song Junde,et al.  GDILC: a grid-based density-isoline clustering algorithm , 2001, 2001 International Conferences on Info-Tech and Info-Net. Proceedings (Cat. No.01EX479).

[3]  Joshua Zhexue Huang,et al.  A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining , 1997, DMKD.

[4]  Samuel Sambasivam,et al.  Advanced Data Clustering Methods of Mining Web Documents , 2006 .

[5]  L. A. Goodman Exploratory latent structure analysis using both identifiable and unidentifiable models , 1974 .

[6]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[7]  Dino Pedreschi,et al.  Time-focused clustering of trajectories of moving objects , 2006, Journal of Intelligent Information Systems.

[8]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[9]  Heikki Mannila,et al.  Probabilistic modeling of transaction data with applications to profiling, visualization, and prediction , 2001, KDD '01.

[10]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[11]  Abraham Kandel,et al.  Feature-based fuzzy classification for interpretation of mammograms , 2000, Fuzzy Sets Syst..

[12]  Michael K. Ng,et al.  A fuzzy k-modes algorithm for clustering categorical data , 1999, IEEE Trans. Fuzzy Syst..

[13]  Tommy W. S. Chow,et al.  A new shifting grid clustering algorithm , 2004, Pattern Recognit..

[14]  Cheng-Fa Tsai,et al.  ACODF: a novel data clustering approach for data mining in large databases , 2004 .

[15]  Michael K. Ng,et al.  A Note on K-modes Clustering , 2003, J. Classif..

[16]  Xinbo Gao,et al.  A New Feature Weighted Fuzzy Clustering Algorithm , 2005, RSFDGrC.

[17]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[18]  Pradeep Kumar,et al.  Rough clustering of sequential data , 2007, Data Knowl. Eng..

[19]  Israel Spiegler,et al.  Hempel's Raven paradox: a positive approach to cluster analysis , 2000, Comput. Oper. Res..

[20]  Paul E. Green,et al.  K-modes Clustering , 2001, J. Classif..

[21]  Zhengxin Chen,et al.  An iterative initial-points refinement algorithm for categorical data clustering , 2002, Pattern Recognit. Lett..

[22]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[23]  Ana L. N. Fred,et al.  Partitional vs Hierarchical Clustering Using a Minimum Grammar Complexity Approach , 2000, SSPR/SPR.

[24]  Israel Spiegler,et al.  Investigating diversity of clustering methods: An empirical comparison , 2007, Data Knowl. Eng..

[25]  Li Yujian A clustering algorithm based on maximal θ-distant subtrees , 2007 .

[26]  David Harel,et al.  Clustering spatial data using random walks , 2001, KDD '01.

[27]  Daoqiang Zhang,et al.  Fast and robust fuzzy c-means clustering algorithms incorporating local information for image segmentation , 2007, Pattern Recognit..

[28]  Derya Birant,et al.  ST-DBSCAN: An algorithm for clustering spatial-temporal data , 2007, Data Knowl. Eng..

[29]  Abdol Hamid Pilevar,et al.  GCHL: A grid-clustering algorithm for high-dimensional very large spatial data bases , 2005, Pattern Recognit. Lett..