Finding Multiple Coherent Biclusters in Microarray Data Using Variable String Length Multiobjective Genetic Algorithm

Microarray technology enables the simultaneous monitoring of the expression pattern of a huge number of genes across different experimental conditions. Biclustering in microarray data is an important technique that discovers a group of genes that are coregulated in a subset of conditions. Biclustering algorithms require to identify coherent and nontrivial biclusters, i.e., the biclusters should have low mean squared residue and high row variance. A multiobjective genetic biclustering technique is proposed here that optimizes these objectives simultaneously. A novel encoding scheme that uses variable chromosome length is developed. Moreover, a new quantitative measure to evaluate the goodness of the biclusters is proposed. The performance of the proposed algorithm has been evaluated on both simulated and real-life gene expression datasets, and compared with some other well-known biclustering techniques.

[1]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[2]  Eckart Zitzler,et al.  BicAT: a biclustering analysis toolbox , 2006, Bioinform..

[3]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[4]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[5]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[6]  Sven Bergmann,et al.  Defining transcription modules using large-scale gene expression data , 2004, Bioinform..

[7]  C. Müller,et al.  Large-scale clustering of cDNA-fingerprinting data. , 1999, Genome research.

[8]  Jill Duncan,et al.  Analyzing microarray data using cluster analysis. , 2003, Pharmacogenomics.

[9]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[10]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[11]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[12]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[13]  Doulaye Dembélé,et al.  Fuzzy C-means Method for Clustering Microarray Data , 2003, Bioinform..

[14]  Clara Pizzuti,et al.  Gene Expression Biclustering Using Random Walk Strategies , 2005, DaWaK.

[15]  Padraig Cunningham,et al.  Biclustering of expression data using simulated annealing , 2005, 18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05).

[16]  C. A. Coello Coello,et al.  A Comprehensive Survey of Evolutionary-Based Multiobjective Optimization Techniques , 1999, Knowledge and Information Systems.

[17]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[18]  Ron Shamir,et al.  A clustering algorithm based on graph connectivity , 2000, Inf. Process. Lett..

[19]  Krista Rizman Zalik,et al.  Biclustering of gene expression data , 2005 .

[20]  Lai-Wan Chan,et al.  Biclustering Gene Expression Profiles by Alternately Sorting with Weighted Correlated Coefficient , 2006, 2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing.

[21]  Ujjwal Maulik,et al.  A Simulated Annealing-Based Multiobjective Optimization Algorithm: AMOSA , 2008, IEEE Transactions on Evolutionary Computation.

[22]  G. Getz,et al.  Coupled two-way clustering analysis of gene microarray data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[23]  T. M. Murali,et al.  Extracting Conserved Gene Expression Motifs from Gene Expression Data , 2002, Pacific Symposium on Biocomputing.

[24]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[25]  Rainer Fuchs,et al.  Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters , 2001, Bioinform..

[26]  Ujjwal Maulik,et al.  An improved algorithm for clustering gene expression data , 2007, Bioinform..

[27]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[28]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Eckart Zitzler,et al.  An EA framework for biclustering of gene expression data , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[30]  Roded Sharan,et al.  Biclustering Algorithms: A Survey , 2007 .

[31]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.