Gene Expression Data Analysis Using a Novel Approach to Biclustering Combining Discrete and Continuous Data

Many different methods exist for pattern detection in gene expression data. In contrast to classical methods, biclustering has the ability to cluster a group of genes together with a group of conditions (replicates, set of patients or drug compounds). However, since the problem is NP-complex, most algorithms use heuristic search functions and therefore might converge towards local maxima. By using the results of biclustering on discrete data as a starting point for a local search function on continuous data, our algorithm avoids the problem of heuristic initialization. Similar to OPSM, our algorithm aims to detect biclusters whose rows and columns can be ordered such that row values are growing across the bicluster's columns and vice-versa. Results have been generated on the yeast genome (Saccharomyces cerevisiae), a human cancer dataset and random data. Results on the yeast genome showed that 89% of the one hundred biggest non-overlapping biclusters were enriched with Gene Ontology annotations. A comparison with OPSM and ISA demonstrated a better efficiency when using gene and condition orders. We present results on random and real datasets that show the ability of our algorithm to capture statistically significant and biologically relevant biclusters.

[1]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[2]  Federico Divina,et al.  Evolutionary computation for biclustering of gene expression , 2005, SAC '05.

[3]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[4]  Eckart Zitzler,et al.  BicAT: a biclustering analysis toolbox , 2006, Bioinform..

[5]  Philip S. Yu,et al.  Enhanced biclustering on expression data , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[6]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[7]  Padraig Cunningham,et al.  Biclustering of expression data using simulated annealing , 2005, 18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05).

[8]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[9]  Sven Bergmann,et al.  Iterative signature algorithm for the analysis of large-scale gene expression data. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  T. M. Murali,et al.  Extracting Conserved Gene Expression Motifs from Gene Expression Data , 2002, Pacific Symposium on Biocomputing.

[11]  Yaniv Ziv,et al.  Revealing modular organization in the yeast transcriptional network , 2002, Nature Genetics.

[12]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[13]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[14]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[15]  Luca Benini,et al.  Discovering coherent biclusters from gene expression data using zero-suppressed binary decision diagrams , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[17]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  Chris Sander,et al.  Characterizing gene sets with FuncAssociate , 2003, Bioinform..

[19]  Yoram Singer,et al.  Learning to Order Things , 1997, NIPS.

[20]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[21]  Sun-Yuan Kung,et al.  Multi-class biclustering and classification based on modeling of gene regulatory networks , 2005, Fifth IEEE Symposium on Bioinformatics and Bioengineering (BIBE'05).

[22]  Shotaro Akaho,et al.  Learning from order examples , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[23]  G. Getz,et al.  Coupled two-way clustering analysis of gene microarray data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.