An Analysis on Clustering Algorithms in Data Mining

Clustering is the grouping together of similar data items into clusters. Clustering analysis is one of the main analytical methods in data mining; the method of clustering algorithm will influence the clustering results directly. This paper discusses the various types of algorithms like k-means clustering algorithms, etc.... and analyzes the advantages and shortcomings of the various algorithms. In each type we can calculate the distance between each data object and all cluster centers in each iteration, which makes the efficiency of clustering is not high. This paper provides a broad survey of the most basic techniques and identifies .This paper also deals with the issues of clustering algorithm such as time complexity and accuracy to provide the better results based on various environments. The results are discussed on huge datasets.

[1]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[2]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[3]  Zhang Chun-ping,et al.  Research on K-means Clustering Algorithm , 2011 .

[4]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[5]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[6]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[7]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[8]  Dino Isa,et al.  Using the self organizing map for clustering of text documents , 2009, Expert Syst. Appl..

[9]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[10]  Vijay V. Raghavan,et al.  A clustering strategy based on a formalism of the reproductive process in natural systems , 1979, SIGIR '79.

[11]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[12]  Pedro Hípola G. Salton, Automatic text processing: The Transformation Analysis and Retrieval of Information by Computer , 1991 .

[13]  Dimitrios Gunopulos,et al.  Subspace Clustering of High Dimensional Data , 2004, SDM.