An Improved Multi-SOM Algorithm for Determining the Optimal Number of Clusters

The interpretation of the quality of clusters and the determination of the optimal number of clusters is still a crucial problem in cluster Analysis. In this paper, we focus in on multi-SOM clustering approach which overcomes the problem of extracting the number of clusters from the SOM map through the use of a clustering validity index. We test the multi-SOM algorithm using real and artificial data sets with different evaluation criteria not used previously such as Davies Bouldin index, and Silhouette index. The multi-SOM algorithm is compared to k-means and Birch methods. Results developed with R language show that it is more efficient than classical clustering methods.

[1]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[2]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[3]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Mohamed Limam,et al.  An improved multi-SOM algorithm , 2015 .

[5]  S. Abdelhak,et al.  Application of Multi-SOM clustering approach to macrophage gene expression analysis. , 2009, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[6]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[7]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[8]  Shing I. Chang,et al.  Determination of cluster number in clustering microarray data , 2005, Appl. Math. Comput..

[9]  Michalis Vazirgiannis,et al.  Quality Scheme Assessment in the Clustering Process , 2000, PKDD.

[10]  Jean-Charles Lamirel,et al.  Intelligent Patent Analysis through the Use of a Neural Network: Experiment of Multi-Viewpoint Analysis with the MultiSOM Model , 2003, ACL 2003.

[11]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[12]  Jean-Charles Lamirel,et al.  Using artificial neural networks for mapping of scienceand technology: A multi-self-organizing-maps approach , 2001, Scientometrics.

[13]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[14]  Malika Charrad,et al.  NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set , 2014 .

[15]  J.-C. Lamirel,et al.  MultiSOM: a multimap extension of the SOM model. Application to information discovery in an iconographic context , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[16]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[17]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.