Improvements for Determining the Number of Clusters in k-Means for Innovation Databases in SMEs

Abstract The Automatic Clustering using Differential Evolution (ACDE) is one of the grouping methods capable of automatically determining the number of the cluster. However, ACDE continues making use of the strategy manual to determine the activation threshold of k, which affects its performance. In this study, the problem of ACDE is enhanced using the U Control Chart (UCC). The performance of the proposed method was tested using five data sets from the National Administrative Department of Statistics (DANE - Departamento Administrativo Nacional de Estadisticas) and the Ministry of Commerce, Industry, and Tourism of Colombia for the innovative capacity of Small and Medium-sized Enterprises (SMEs) and were assessed by the Davies Bouldin Index (DBI) and the Cosine Similarity (CS) measure. The results show that the proposed method yields excellent performance compared to prior researches for most datasets with optimal cluster number yet lowest DBI and CS measure. It can be concluded that the UCC method is able to determine k activation threshold in ACDE that caused effective determination of the cluster number for k-means clustering.

[1]  Sin Chun Ng,et al.  Improved activation schema on Automatic Clustering using Differential Evolution algorithm , 2017, 2017 IEEE Congress on Evolutionary Computation (CEC).

[2]  Amelec Viloria,et al.  Application of Classification Technique of Data Mining for Employee Management System , 2018, DMBD.

[3]  R. J. Kuo,et al.  Automatic Clustering Combining Differential Evolution Algorithm and k-Means Algorithm , 2013 .

[4]  Frank L. Bartels,et al.  Mapping, measuring and managing African national systems of innovation for policy and development: the case of the Ghana national system of innovation , 2014 .

[5]  Wilfrido Gómez-Flores,et al.  Automatic clustering using nature-inspired metaheuristics: A survey , 2016, Appl. Soft Comput..

[6]  Amelec Viloria,et al.  Use of the Industrial Property System for New Creations in Colombia: A Departmental Analysis (2000-2016) , 2018, DMBD.

[7]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[8]  Ihsan Kaya,et al.  A genetic algorithm approach to determine the sample size for attribute control charts , 2009, Inf. Sci..

[9]  Amelec Viloria,et al.  Methodology for the Reduction and Integration of Data in the Performance Measurement of Industries Cement Plants , 2018, DMBD.

[10]  Ajith Abraham,et al.  FSDE-Forced Strategy Differential Evolution used for data clustering , 2016, J. King Saud Univ. Comput. Inf. Sci..

[11]  Amelec Viloria,et al.  Company Family, Innovation and Colombian Graphic Industry: A Bayesian Estimation of a Logistical Model , 2018, DMBD.

[12]  Saptarshi Chakraborty,et al.  Simultaneous variable weighting and determining the number of clusters—A weighted Gaussian means algorithm , 2018, Statistics & Probability Letters.

[13]  Swagatam Das,et al.  Automatic Clustering Using an Improved Differential Evolution Algorithm , 2007 .

[14]  Viloria Amelec,et al.  Increased Efficiency in a Company of Development of Technological Solutions in the Areas Commercial and of Consultancy , 2015 .