A Granular Self-Organizing Map for Clustering and Gene Selection in Microarray Data

A new granular self-organizing map (GSOM) is developed by integrating the concept of a fuzzy rough set with the SOM. While training the GSOM, the weights of a winning neuron and the neighborhood neurons are updated through a modified learning procedure. The neighborhood is newly defined using the fuzzy rough sets. The clusters (granules) evolved by the GSOM are presented to a decision table as its decision classes. Based on the decision table, a method of gene selection is developed. The effectiveness of the GSOM is shown in both clustering samples and developing an unsupervised fuzzy rough feature selection (UFRFS) method for gene selection in microarray data. While the superior results of the GSOM, as compared with the related clustering methods, are provided in terms of $\beta $ -index, DB-index, Dunn-index, and fuzzy rough entropy, the genes selected by the UFRFS are not only better in terms of classification accuracy and a feature evaluation index, but also statistically more significant than the related unsupervised methods. The C-codes of the GSOM and UFRFS are available online at http://avatharamg.webs.com/software-code.

[1]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Qiang Shen,et al.  New Approaches to Fuzzy-Rough Feature Selection , 2009, IEEE Transactions on Fuzzy Systems.

[3]  Soumitra Dutta,et al.  Fuzzy rough granular neural networks, fuzzy granules, and classification , 2011, Theor. Comput. Sci..

[4]  Sankar K. Pal,et al.  Rough Self Organizing Map , 2004, Applied Intelligence.

[5]  Jill P. Mesirov,et al.  Subclass Mapping: Identifying Common Subtypes in Independent Disease Data Sets , 2007, PloS one.

[6]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[7]  Anna Maria Radzikowska,et al.  A comparative study of fuzzy rough sets , 2002, Fuzzy Sets Syst..

[8]  C. Dabrosin,et al.  Flaxseed inhibits metastasis and decreases extracellular vascular endothelial growth factor in human breast cancer xenografts. , 2002, Cancer letters.

[9]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[10]  S. Pal,et al.  Segmentation of remotely sensed images with fuzzy thresholding, and quantitative evaluation , 2000 .

[11]  Richard Jensen,et al.  Unsupervised fuzzy-rough set-based dimensionality reduction , 2013, Inf. Sci..

[12]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[14]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[15]  Joaquín Dopazo,et al.  FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes , 2004, Bioinform..

[16]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[17]  A. Skowron,et al.  Towards adaptive calculus of granules , 1998, 1998 IEEE International Conference on Fuzzy Systems Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36228).

[18]  Sankar K. Pal,et al.  Fuzzy–Rough Sets for Information Measures and Selection of Relevant Genes From Microarray Data , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  Sankar K. Pal,et al.  Fuzzy rough granular self-organizing map and fuzzy rough entropy , 2012, Theor. Comput. Sci..

[20]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[22]  Hong Yan Convergence condition and efficient implementation of the fuzzy curve-tracing (FCT) algorithm , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[23]  Sankar K. Pal,et al.  Unsupervised feature evaluation: a neuro-fuzzy approach , 2000, IEEE Trans. Neural Networks Learn. Syst..

[24]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[25]  Sankar K. Pal,et al.  Multilayer perceptron, fuzzy sets, and classification , 1992, IEEE Trans. Neural Networks.

[26]  Sankar K. Pal,et al.  Rough Set Based Generalized Fuzzy $C$ -Means Algorithm and Quantitative Indices , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[27]  Lotfi A. Zadeh,et al.  Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic , 1997, Fuzzy Sets Syst..

[28]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[29]  T. Golub,et al.  Molecular profiling of diffuse large B-cell lymphoma identifies robust subtypes including one characterized by host inflammatory response. , 2004, Blood.

[30]  Pradipta Maji,et al.  Rough-Fuzzy Clustering for Grouping Functionally Similar Genes from Microarray Data , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[31]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[32]  Jingtao Yao,et al.  A granular computing framework for self-organizing maps , 2009, Neurocomputing.

[33]  Jung-Hsien Chiang,et al.  A Combination of Rough-Based Feature Selection and RBF Neural Network for Classification Using Gene Expression Data , 2008, IEEE Transactions on NanoBioscience.

[34]  Xizhao Wang,et al.  On the generalization of fuzzy rough sets , 2005, IEEE Transactions on Fuzzy Systems.

[35]  Sankar K. Pal,et al.  Fuzzy rough sets, and a granular neural network for unsupervised feature selection , 2013, Neural Networks.

[36]  Sanghamitra Bandyopadhyay,et al.  Dynamic Range-Based Distance Measure for Microarray Expressions and a Fast Gene-Ordering Algorithm , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).