Improved K-Means Algorithm on Home Industry Data Clustering in the Province of Bangka Belitung

The Government of Bangka Belitung Islands Province has not classified the home industry until now. Based on these problems, we propose a k-means algorithm for clustering home industry data. The k-means algorithm is widely used because it is straightforward and very suitable for grouping data. However, in its application, the k-means algorithm has a weakness in determining the starting point of the cluster center and, in its selection, is still carried out randomly. As a result, if the random value for initializing the initial centroid value is not right, then the grouping is less than optimal. Internal cluster validation is one way to determine the optimal cluster without knowing prior information from the data. This study aims to identify the optimal group by making improvements to the k-means algorithm and then to test it by applying an internal cluster, namely the Davies-Bouldin Index (DBI) and the Silhouette Index (SI) on the data of home industry in Bangka Belitung Island Province. The optimal cluster calculation results based on internal cluster validation both show that the Silhouette index and the DBI index with k = 3 on improved k-means algorithm. While the traditional k-means algorithm of internal cluster validation both show that the Silhouette index and the Davies-Bouldin Index with k = 2. The conclusion is k = 3 on the Davies-Bouldin Index of this research data gives good results for clustering home industry data in Bangka Belitung Islands Province.

[1]  Feng Liu,et al.  Effective Clustering Analysis Based on New Designed CVI and Improved Clustering Algorithms , 2018, 2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom).

[2]  E. Febriani,et al.  Problems and requirement analysis as a first step to connect researchers and small and medium enterprises (SMEs) , 2018 .

[3]  B. K. Khotimah,et al.  A GENETIC ALGORITHM FOR OPTIMIZED INITIAL CENTERS K-MEANS CLUSTERING IN SMEs , 2016 .

[4]  Zhenyuan Xu,et al.  A novel internal validity index based on the cluster centre and the nearest neighbour cluster , 2018, Appl. Soft Comput..

[5]  Santosh Kumar Majhi,et al.  Optimal cluster analysis using hybrid K-Means and Ant Lion Optimizer , 2018, Karbala International Journal of Modern Science.

[6]  M. Arif Wani,et al.  A novel point density based validity index for clustering gene expression datasets , 2017, Int. J. Data Min. Bioinform..

[7]  Shamim Akhter,et al.  Exploreing K-Means with Internal Validity Indexes for Data Clustering in Traffic Management System , 2017 .

[8]  Muhammad Asim,et al.  SMEs in the Contemporary Era of Global Competition , 2019, Procedia Computer Science.

[9]  Diana Puspita Sari,et al.  Exploring the implementation of green supply chain with cluster and discriminant analysis: Case study: Furniture industry at Central Java Semarang , 2018, 2018 5th International Conference on Industrial Engineering and Applications (ICIEA).

[10]  Tommi Kärkkäinen,et al.  Comparison of Internal Clustering Validation Indices for Prototype-Based Clustering , 2017, Algorithms.

[11]  Nittaya Kerdprasop,et al.  The Clustering Validity with Silhouette and Sum of Squared Errors , 2015 .

[13]  H Mawengkang,et al.  Mapping of medicine data with k-means and apriori combinations based on patient diagnosis , 2018 .

[14]  Cesar H. Comin,et al.  Clustering algorithms: A comparative approach , 2016, PloS one.

[15]  Xuan Li,et al.  An Improved K-means Text Clustering Algorithm by Optimizing Initial Cluster Centers , 2016, 2016 7th International Conference on Cloud Computing and Big Data (CCBD).

[16]  Chunhui Yuan,et al.  Research on K-Value Selection Method of K-Means Clustering Algorithm , 2019, J.

[17]  Tulus,et al.  K-Means Algorithm Performance Analysis With Determining The Value Of Starting Centroid With Random And KD-Tree Method , 2017 .

[18]  Amelec Viloria,et al.  Improvements for Determining the Number of Clusters in k-Means for Innovation Databases in SMEs , 2019, ANT/EDI40.