Feature Selection for Self-Organizing Map

In this paper, we present a new heuristic measure for optimizing database used as input layer of Self Organizing Map (SOM). This heuristic called Hl-SOM (Heuristic Input for SOM) consists of selection of variables for clustering with SOM algorithm. HI-SOM allows to identify and to select important variables in the feature spaces. Thus, we eliminate redundant variables and those do not contain enough relevant information. The proposed measure is used in SOM learning algorithm in order to reduce the database dimension. Hence, HI-SOM select the important variables to train the "best" SOM. We illustrate this method with three databases from public data set repository. We show the effectiveness to identify the important variables which gives homogenous clusters.

[1]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[2]  M. Brusco,et al.  A variable-selection heuristic for K-means clustering , 2001 .

[3]  Douglas H. Fisher,et al.  Iterative Optimization and Simplification of Hierarchical Clusterings , 1996, J. Artif. Intell. Res..

[4]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[5]  MÉZIANE YACOUB,et al.  Features Selection and Architecture Optimization in Connectionist Systems , 2000, Int. J. Neural Syst..

[6]  J. Kittler Feature selection and extraction , 1978 .

[7]  Anil K. Jain,et al.  Simultaneous feature selection and clustering using mixture models , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Saman K. Halgamuge,et al.  Unsupervised Class Discovery and Feature Selection using an Improved Hierarchical Dynamic Self-Organizing Map , 2004 .

[9]  Younès Bennani Systèmes d'apprentissage connexionnistes : Sélection de variables , 2001 .

[10]  Volker Roth,et al.  Feature Selection in Clustering Problems , 2003, NIPS.

[11]  Younès Bennani,et al.  Selection of clusters number and features subset during a two-levels clustering task , 2006, Artificial Intelligence and Soft Computing.

[12]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[13]  James Kelly,et al.  AutoClass: A Bayesian Classification System , 1993, ML.

[14]  Manoranjan Dash,et al.  Feature Selection for Clustering , 2009, Encyclopedia of Database Systems.

[15]  A. Ennaji,et al.  An Incremental Hierarchical Clustering , 1999 .

[16]  Bala Srinivasan,et al.  Automatic Clustering and Rule Extraction using a Dynamic SOM Tree , 2000 .