A dynamic K-means clustering for data mining

Received Sep 25, 2018 Revised Nov 24, 2018 Accepted Dec 8, 2018 Data mining is the process of finding structure of data from large data sets. With this process, the decision makers can make a particular decision for further development of the real-world problems. Several data clusteringtechniques are used in data mining for finding a specific pattern of data. The K-means method isone of the familiar clustering techniques for clustering large data sets. The K-means clustering method partitions the data set based on the assumption that the number of clusters are fixed. The main problem of this method is that if the number of clusters is to be chosen small then there is a higher probability of adding dissimilar items into the same group. On the other hand, if the number of clusters is chosen to be high, then there is a higher chance of adding similar items in the different groups. In this paper, we address this issue by proposing a new K-Means clustering algorithm. The proposed method performs data clustering dynamically. The proposed method initially calculates a threshold value as a centroid of KMeans and based on this value the number of clusters are formed. At each iteration of K-Means, if the Euclidian distance between two points is less than or equal to the threshold value, then these two data points will be in the same group. Otherwise, the proposed method will create a new cluster with the dissimilar data point. The results show that the proposed method outperforms the original K-Means method.

[1]  Abdel-Badeeh M. Salem,et al.  An efficient enhanced k-means clustering algorithm , 2006 .

[2]  Ji-Gui Sun,et al.  Clustering Algorithms Research , 2008 .

[3]  M. P. Sebastian,et al.  Improving the Accuracy and Efficiency of the k-means Clustering Algorithm , 2009 .

[4]  Guan Yong,et al.  Research on k-means Clustering Algorithm: An Improved k-means Clustering Algorithm , 2010, 2010 Third International Symposium on Intelligent Information Technology and Security Informatics.

[5]  M. P. S Bhatia,et al.  Data clustering with modified K-means algorithm , 2011, 2011 International Conference on Recent Trends in Information Technology (ICRTIT).

[6]  Ahamed B M Shafeeq,et al.  Dynamic Clustering of Data with Modified K-Means Algorithm , 2012 .

[7]  Seema Sharma,et al.  Machine learning techniques for data mining: A survey , 2013 .

[8]  Li Yanping,et al.  The improved research on k-means clustering algorithm in initial values , 2013, Proceedings 2013 International Conference on Mechatronic Sciences, Electric Engineering and Computer (MEC).

[9]  B. M. Vidyavathi,et al.  A Survey on Applications of Data Mining using Clustering Techniques , 2015 .

[10]  Lihong Wang,et al.  K*-Means: An Effective and Efficient K-Means Clustering Algorithm , 2016, 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom).

[11]  V. W. Ajin,et al.  Big data and clustering algorithms , 2016, 2016 International Conference on Research Advances in Integrated Navigation Systems (RAINS).