An evolutionary algorithm for discovering biclusters in gene expression data of breast cancer

The analysis of gene expression data of breast cancer is important for discovering the signatures that can classify different subtypes of tumors and predict prognosis. Biclustering algorithms have been proven to be able to group the genes with similar expression patterns under a number of samples and offer the capability to analyze the microarray data of cancer. In this study, we propose a new biclustering algorithm which uses an evolutionary search procedure. The algorithm is applied to the conditions to search for combinations of conditions for a potential bicluster. Preliminary results using synthetic and real yeast data sets demonstrate that our algorithm outperforms several existing ones. We have also applied the method to real microarray data sets of breast cancer, and successfully found several biclusters, which can be used as signatures for differentiating tumor types.

[1]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[2]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[3]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[4]  Federico Divina,et al.  Biclustering of expression data with evolutionary computation , 2006, IEEE Transactions on Knowledge and Data Engineering.

[5]  John Quackenbush Microarray analysis and tumor classification. , 2006, The New England journal of medicine.

[6]  J. Reis-Filho,et al.  The impact of expression profiling on prognostic and predictive testing in breast cancer , 2006, Journal of Clinical Pathology.

[7]  Yaniv Ziv,et al.  Revealing modular organization in the yeast transcriptional network , 2002, Nature Genetics.

[8]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[9]  Ron Shamir,et al.  EXPANDER – an integrative program suite for microarray data analysis , 2005, BMC Bioinformatics.

[10]  Eckart Zitzler,et al.  An EA framework for biclustering of gene expression data , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[11]  A. E. Eiben,et al.  Introduction to Evolutionary Computing , 2003, Natural Computing Series.

[12]  Hong Yan,et al.  A new geometric biclustering algorithm based on the Hough transform for analysis of large-scale microarray data. , 2008, Journal of theoretical biology.

[13]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[14]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[15]  Lusheng Wang,et al.  Computing the maximum similarity bi-clusters of gene expression data , 2007, Bioinform..

[16]  Hong Yan,et al.  HoughFeature, a novel method for assessing drug effects in three-color cDNA microarray experiments , 2007, BMC Bioinformatics.

[17]  Christopher R. Houck,et al.  A Genetic Algorithm for Function Optimization: A Matlab Implementation , 2001 .

[18]  Brian Leyland-Jones,et al.  A systems approach to clinical oncology: Focus on breast cancer , 2006, Proteome Science.

[19]  T. M. Murali,et al.  Extracting Conserved Gene Expression Motifs from Gene Expression Data , 2002, Pacific Symposium on Biocomputing.

[20]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  Chris Sander,et al.  Characterizing gene sets with FuncAssociate , 2003, Bioinform..

[22]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Hong Yan,et al.  Biclustering gene expression data based on a high dimensional geometric method , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[24]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.