Kernel-based Self-organized Maps Trained with Supervised Bias for Gene Expression Data Analysis

Self-Organized Maps (SOMs) are a popular approach for analyzing genome-wide expression data. However, most SOM based approaches ignore prior knowledge about functional gene categories. Also, Self Organized Map (SOM) based approaches usually develop topographic maps with disjoint and uniform activation regions that correspond to a hard clustering of the patterns at their nodes. We present a novel Self-Organizing map, the Kernel Supervised Dynamic Grid Self-Organized Map (KSDG-SOM). This model adapts its parameters in a kernel space. Gaussian kernels are used and their mean and variance components are adapted in order to optimize the fitness to the input density. The KSDG-SOM also grows dynamically up to a size defined with statistical criteria. It is capable of incorporating a priori information for the known functional characteristics of genes. This information forms a supervised bias at the cluster formation and the model owns the potentiality of revising incorrect functional labels. The new method overcomes the main drawbacks of most of the existing clustering methods that lack a mechanism for dynamical extension on the basis of a balance between unsupervised and supervised drives.