Analysis of microarray data using multiobjective variable string length genetic fuzzy clustering

In this article, a novel multiobjective variable string length real coded genetic fuzzy clustering scheme for clustering microarray gene expression data has been proposed. The proposed technique automatically evolves the number of clusters along with the clustering result. The multiobjective variable string length clustering technique encodes the cluster centers in its chromosomes and simultaneously optimizes two fuzzy validity indices namely PBM index and Xie-Beni validity measure. In the final generation, it produces a set of non-dominated solutions, from which the best solution is selected using Silhouette index which is independent of the number of clusters. The corresponding chromosome length provides the number of clusters. The proposed method is applied on three publicly available real life gene expression data. Superiority of the proposed method over some other well known clustering algorithms has been demonstrated quantitatively.

[1]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  R. K. Ursem Multi-objective Optimization using Evolutionary Algorithms , 2009 .

[4]  Seo Young Kim,et al.  Effect of data normalization on fuzzy clustering of DNA microarray data , 2005, BMC Bioinformatics.

[5]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  G. H. Slusser,et al.  Statistical analysis in psychology and education , 1960 .

[7]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[8]  Jens Jäkel,et al.  A New Convergence Proof of Fuzzy c-Means , 2005, IEEE Trans. Fuzzy Syst..

[9]  Pascal Nsoh,et al.  Large-scale temporal gene expression mapping of central nervous system development , 2007 .

[10]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[11]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[12]  Ujjwal Maulik,et al.  A Simulated Annealing-Based Multiobjective Optimization Algorithm: AMOSA , 2008, IEEE Transactions on Evolutionary Computation.

[13]  Lothar Thiele,et al.  An evolutionary algorithm for multiobjective optimization: the strength Pareto approach , 1998 .

[14]  Marco Laumanns,et al.  SPEA2: Improving the strength pareto evolutionary algorithm , 2001 .

[15]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[16]  Kalyanmoy Deb,et al.  A Fast Elitist Non-dominated Sorting Genetic Algorithm for Multi-objective Optimisation: NSGA-II , 2000, PPSN.

[17]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[18]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[19]  Joshua D. Knowles,et al.  Multi-Objective Clustering and Cluster Validation , 2006, Multi-Objective Machine Learning.

[20]  Ujjwal Maulik,et al.  Validity index for crisp and fuzzy clusters , 2004, Pattern Recognit..

[21]  Ujjwal Maulik,et al.  An improved algorithm for clustering gene expression data , 2007, Bioinform..

[22]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[23]  Zhaohui S. Qin,et al.  Clustering microarray gene expression data using weighted Chinese restaurant process , 2006, Bioinform..

[24]  Ujjwal Maulik,et al.  Multiobjective Genetic Clustering for Pixel Classification in Remote Sensing Imagery , 2007, IEEE Transactions on Geoscience and Remote Sensing.

[25]  Ujjwal Maulik,et al.  Fuzzy partitioning using a real-coded variable-length genetic algorithm for pixel classification , 2003, IEEE Trans. Geosci. Remote. Sens..

[26]  D. Botstein,et al.  The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.

[27]  Sanghamitra Bandyopadhyay,et al.  Analysis of Biological Data: A Soft Computing Approach , 2007, Science, Engineering, and Biology Informatics.

[28]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[29]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.