Parallel identification of gene biclusters with coherent evolutions

Finding clusters of genes with expression levels that evolve coherently under subsets of conditions can help uncover genetic pathways. This can be done by applying a biclustering procedure to gene expression data. Given a microarray data set with M genes and N conditions, we define a bicluster with coherent evolution as a subset of genes with expression levels that are nondecreasing as a function of a particular ordered subset of conditions. We propose a new biclustering procedure that identifies all biclusters with a specified number of K conditions in parallel with O(MK) complexity. Unlike almost all prior biclustering techniques, the proposed approach is guaranteed to find all biclusters with a specified minimum numbers of genes and conditions in the data set. All of the biclusters it identifies have no imperfection, i.e., the evolutions of the genes in each bicluster will be coherent across all conditions in the bicluster. Furthermore, the complexity of the proposed approach is lower than that of prior approaches.

[1]  G. Getz,et al.  Coupled two-way clustering analysis of gene microarray data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[2]  D. Tang,et al.  Algorithm 452: enumerating combinations of m out of n objects [G6] , 1973, CACM.

[3]  H. Othmer,et al.  The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in Drosophila melanogaster. , 2003, Journal of theoretical biology.

[4]  Ron Shamir,et al.  CLICK and EXPANDER: a system for clustering and visualizing gene expression data , 2003, Bioinform..

[5]  Haris Vikalo,et al.  A PROBABILISTIC MODEL FOR INHERENT NOISE AND SYSTEMATIC ERRORS OF MICROARRAYS , 2005 .

[6]  David Botstein,et al.  Processing and modeling genome-wide expression data using singular value decomposition , 2001, SPIE BiOS.

[7]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[8]  Philip S. Yu,et al.  Enhanced biclustering on expression data , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[9]  C. Ball,et al.  Saccharomyces Genome Database. , 2002, Methods in enzymology.

[10]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[11]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[12]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[13]  H. Mewes,et al.  The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. , 2004, Nucleic acids research.

[14]  Babak Hassibi,et al.  Optimal Estimation of Gene Expression Levels in Microarrays , 2005 .

[15]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[16]  Ahmed H. Tewfik,et al.  Biclustering of DNA microarray data with early pruning , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[17]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[18]  Dimitrios Vogiatzis,et al.  Missing Value Estimation for DNA Microarrays with Mutliresolution Schemes , 2006, ICANN.

[19]  Shoshana J. Wodak,et al.  CYGD: the Comprehensive Yeast Genome Database , 2004, Nucleic Acids Res..

[20]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[21]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.