Data clustering approaches survey and analysis

In the current world, there is a need to analyze and extract information from data. Clustering is one such analytical method which involves the distribution of data into groups of identical objects. Every group is known as a cluster, which consists of objects that have affinity within the cluster and disparity with the objects in other groups. This paper is intended to examine and evaluate various data clustering algorithms. The two major categories of clustering approaches are partition and hierarchical clustering. The algorithms which are dealt here are: k-means clustering algorithm, hierarchical clustering algorithm, density based clustering algorithm, self-organizing map algorithm, and expectation maximization clustering algorithm. All the mentioned algorithms are explained and analyzed based on the factors like the size of the dataset, type of the data set, number of clusters created, quality, accuracy and performance. This paper also provides the information about the tools which are used to implement the clustering approaches. The purpose of discussing the various software/tools is to make the beginners and new researchers to understand the working, which will help them to come up with new product and approaches for the improvement.

[1]  Chia-Chen Yen,et al.  ANGEL: A New Effective and Efficient Hybrid Clustering Technique for Large Databases , 2007, PAKDD.

[2]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[3]  Osama Abu Abbas,et al.  Comparisons Between Data Clustering Algorithms , 2008, Int. Arab J. Inf. Technol..

[4]  Divakar Singh,et al.  Performance Evaluation of K-Means and Heirarichal Clustering in Terms of Accuracy and Running Time , 2012 .

[5]  R. Sathya,et al.  Comparison of Supervised and Unsupervised Learning Algorithms for Pattern Classification , 2013 .

[6]  Thierry Coléou,et al.  Interpreter's Corner—Unsupervised seismic facies classification: A review and comparison of techniques and implementation , 2003 .

[7]  Yong Shi,et al.  A Modified Clustering Method Based on Self-Organizing Maps and Its Applications , 2012, ICCS.

[8]  Gautam Biswas,et al.  Unsupervised Learning with Mixed Numeric and Nominal Data , 2002, IEEE Trans. Knowl. Data Eng..

[9]  Carlos Ordonez,et al.  SQLEM: fast clustering in SQL using the EM algorithm , 2000, SIGMOD '00.

[10]  B. Achiriloaie,et al.  VI REFERENCES , 1961 .

[11]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Ian Witten,et al.  Data Mining , 2000 .

[13]  K. Malarvizhi,et al.  Survey on Clustering Techniques in Data Mining , 2014 .

[14]  Cheng-Fa Tsai,et al.  DBSCALE: An efficient density-based clustering algorithm for data mining in large databases , 2010, 2010 Second Pacific-Asia Conference on Circuits, Communications and System.

[15]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.