Comparisons Between Data Clustering Algorithms

Clustering is a division of data into groups of similar objects. Each group, called a cluster, consists of objects that are similar between themselves and dissimilar compared to objects of other groups. This paper is intended to study and compare different data clustering algorithms. The algorithms under investigation are: k-means algorithm, hierarchical clustering algorithm, self-organizing maps algorithm, and expectation maximization clustering algorithm. All these algorithms are compared according to the following factors: size of dataset, number of clusters, type of dataset and type of software used. Some conclusions that are extracted belong to the performance, quality, and accuracy of the clustering algorithms.

[1]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[2]  Gautam Biswas,et al.  Unsupervised Learning with Mixed Numeric and Nominal Data , 2002, IEEE Trans. Knowl. Data Eng..

[3]  Denis Trystram,et al.  A new clustering algorithm for large communication delays , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[4]  Philip S. Yu,et al.  Clustering algorithms for content-based publication-subscription systems , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[5]  Michael Q. Zhang,et al.  Evaluation and comparison of clustering algorithms in analyzing es cell gene expression data , 2002 .

[6]  Alexander F. Gelbukh,et al.  Text Mining at Detail Level Using Conceptual Graphs , 2002, ICCS.

[7]  Carlos Ordonez,et al.  SQLEM: fast clustering in SQL using the EM algorithm , 2000, SIGMOD '00.

[8]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[9]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[10]  Domenico Talia,et al.  Using an out-of-core technique for clustering large data sets , 2001, 12th International Workshop on Database and Expert Systems Applications.

[11]  Chris H. Q. Ding,et al.  Bipartite graph partitioning and data clustering , 2001, CIKM '01.