Seed Point Selection Algorithm in Clustering of Image Data

Massive amount of data are being collected in almost all sectors of life due to recent technological advancements. Various data mining tools including clustering is often applied on huge data sets in order to extract hidden and previously unknown information which can be helpful in future decision-making processes. Clustering is an unsupervised technique of data points which is separated into homogeneous groups. Seed point is an important feature of a clustering technique, which is called the core of the cluster and the performance of seed-based clustering technique depends on the choice of initial cluster center. The initial seed point selection is a challenging job due to formation of better cluster partition with rapidly convergence criteria. In the present research we have proposed the seed point selection algorithm applied on image data by taking the RGB features of color image as well as 2D data based on the maximization of Shannon’s entropy with distance restriction criteria. Our seed point selection algorithm converges in a minimum number of steps for the formation of better clusters. We have applied our algorithm in different image data as well as discrete data and the results appear to be satisfactory. Also we have compared the result with other seed selection methods applied through K-Means algorithm for the comparative study of number of iterations and CPU time with the other clustering technique.

[1]  C. A. Murthy,et al.  Finding a Subset of Representative Points in a Data Set , 1994, IEEE Trans. Syst. Man Cybern. Syst..

[2]  Julius T. Tou,et al.  Pattern Recognition Principles , 1974 .

[3]  Bhanukiran Vinzamuri,et al.  A Survey of Partitional and Hierarchical Clustering Algorithms , 2018, Data Clustering: Algorithms and Applications.

[4]  Adam Prügel-Bennett,et al.  Novel centroid selection approaches for KMeans-clustering based recommender systems , 2015, Inf. Sci..

[5]  Agus Zainal Arifin,et al.  Image segmentation by histogram thresholding using hierarchical cluster analysis , 2006, Pattern Recognit. Lett..

[6]  Thomas Villmann,et al.  Clustering of Categoric Data in Medicine - Application of Evolutionary Algorithms , 2001, Fuzzy Days.

[7]  Jiye Liang,et al.  An initialization method for the K-Means algorithm using neighborhood model , 2009, Comput. Math. Appl..

[8]  Geoffrey H. Ball,et al.  ISODATA, A NOVEL METHOD OF DATA ANALYSIS AND PATTERN CLASSIFICATION , 1965 .

[9]  M M Astrahan SPEECH ANALYSIS BY CLUSTERING, OR THE HYPERPHONEME METHOD , 1970 .

[10]  Xin-Ping Guan,et al.  A distributed energy-efficient clustering algorithm with improved coverage in wireless sensor networks , 2012, Future Gener. Comput. Syst..

[11]  Bidyut Baran Chaudhuri,et al.  A novel multiseed nonhierarchical data clustering technique , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[12]  Jing-Yu Yang,et al.  Hierarchical initialization approach for K-Means clustering , 2008, Pattern Recognit. Lett..

[13]  Qiang Wang,et al.  A clustering algorithm for intrusion detection , 2005, SPIE Defense + Commercial Sensing.

[14]  Keke Chen,et al.  The "Best K" for Entropy-based Categorical Data Clustering , 2005, SSDBM.

[15]  Prasanta K. Jana,et al.  Initialization for K-means Clustering using Voronoi Diagram , 2012 .

[16]  Jiye Liang,et al.  A cluster centers initialization method for clustering categorical data , 2012, Expert Syst. Appl..

[17]  Dilpreet Singh,et al.  A survey on platforms for big data analytics , 2014, Journal of Big Data.

[18]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[19]  Anindya Bhattacharya,et al.  Divisive Correlation Clustering Algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles , 2008, Bioinform..

[20]  Patricio A. Vela,et al.  A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm , 2012, Expert Syst. Appl..