Gene expression data clustering using tree-like SOMs with evolving splitting-merging structures

The paper presents an application of our clustering technique using generalized tree-like SOMs with evolving splitting-merging structures to complex clustering tasks, including, in particular, the sample-based and gene-based clustering of the Lymphoma human cancer microarray data set. It is worth emphasizing that our approach works in a fully unsupervised way, i.e., using unlabelled data and without the necessity to predefine the number of clusters. It is particularly important in the gene-based clustering of microarray data for which the number of gene clusters is unknown in advance. In the sample-based clustering of the Lymphoma data set, our approach gives better results than those reported in the literature (some of alternative methods require, additionally, the cluster number to be defined in advance). In the gene-based clustering of the considered microarray data, out approach generates clusters that are easily divisible into subclusters related to particular sample classes. In some way, it corresponds to subspace clustering that is highly desirable in microarray data analysis.

[1]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[2]  James C. Bezdek,et al.  Generalized clustering networks and Kohonen's self-organizing scheme , 1993, IEEE Trans. Neural Networks.

[3]  Marian B. Gorzalczany,et al.  Cluster Analysis Via Dynamic Self-organizing Neural Networks , 2006, ICAISC.

[4]  Marian B. Gorzalczany,et al.  Application of Genetic Algorithms and Kohonen Networks to Cluster Analysis , 2004, ICAISC.

[5]  Marian B. Gorzalczany,et al.  Generalized SOMs with Splitting-Merging Tree-Like Structures for WWW-Document Clustering , 2015, IFSA-EUSFLAT.

[6]  Anthony McGregor,et al.  Flow Clustering Using Machine Learning Techniques , 2004, PAM.

[7]  Marian B. Gorzalczany,et al.  Generalized Tree-Like Self-Organizing Neural Networks with Dynamically Defined Neighborhood for Cluster Analysis , 2014, ICAISC.

[8]  James C. Bezdek,et al.  Multiple-prototype classifier design , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[9]  Manas Ranjan Patra,et al.  Detecting Network Intrusions ­A Clustering Approach , 2009 .

[10]  Marian B. Gorzalczany,et al.  Microarray Leukemia Gene Data Clustering by Means of Generalized Self-organizing Neural Networks with Evolving Tree-Like Structures , 2015, ICAISC.

[11]  Marian B. Gorzalczany,et al.  Modified Kohonen Networks for Complex Cluster-Analysis Problems , 2004, ICAISC.

[12]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[13]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[14]  Marian B. Gorzalczany,et al.  WWW-Newsgroup-Document Clustering by Means of Dynamic Self-organizing Neural Networks , 2008, ICAISC.