Noise-robust algorithm for identifying functionally associated biclusters from gene expression data

Biclusters are subsets of genes that exhibit similar behavior over a set of conditions. A biclustering algorithm is a useful tool for uncovering groups of genes involved in the same cellular processes and groups of conditions under which these processes take place. In this paper, we propose a polynomial time algorithm to identify functionally highly correlated biclusters. Our algorithm identifies (1) gene sets that simultaneously exhibit additive, multiplicative, and combined patterns and allow high levels of noise, (2) multiple, possibly overlapped, and diverse gene sets, (3) biclusters that simultaneously exhibit negatively and positively correlated gene sets, and (4) gene sets for which the functional association is very high. We validate the level of functional association in our method by using the GO database, protein-protein interactions and KEGG pathways.

[1]  Francesco Masulli,et al.  A Novel Approach for Biclustering Gene Expression Data Using Modular Singular Value Decomposition , 2009, CIBB.

[2]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[3]  Anthony K. H. Tung,et al.  Mining Shifting-and-Scaling Co-Regulation Patterns on Gene Expression Profiles , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[4]  Eugenio Cesario,et al.  Random walk biclustering for microarray data , 2008, Inf. Sci..

[5]  Neelima Gupta,et al.  MIB: Using mutual information for biclustering gene expression data , 2010, Pattern Recognit..

[6]  Ge Yu,et al.  Mining Positive and Negative Co-regulation Patterns from Microarray Data , 2006, Sixth IEEE Symposium on BioInformatics and BioEngineering (BIBE'06).

[7]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[8]  Y. Tu,et al.  Quantitative noise analysis for gene expression microarray experiments , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Joaquín Dopazo,et al.  BABELOMICS: a suite of web tools for functional annotation and analysis of groups of genes in high-throughput experiments , 2005, Nucleic Acids Res..

[10]  Wei Wang,et al.  OP-cluster: clustering by tendency in high dimensional space , 2003, Third IEEE International Conference on Data Mining.

[11]  Eckart Zitzler,et al.  BicAT: a biclustering analysis toolbox , 2006, Bioinform..

[12]  M. Mattson Pathways towards and away from Alzheimer's disease , 2004, Nature.

[13]  Obi L. Griffith,et al.  Discovering significant OPSM subspace clusters in massive gene expression data , 2006, KDD '06.

[14]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[15]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[16]  Shyama Das,et al.  Biclustering gene expression data using KMeans-binary PSO hybrid , 2010 .

[17]  Jin-Kao Hao,et al.  A biclustering algorithm based on a Bicluster Enumeration Tree: application to DNA microarray data , 2009, BioData Mining.

[18]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[19]  Chris Sander,et al.  Characterizing gene sets with FuncAssociate , 2003, Bioinform..

[20]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[21]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[22]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[23]  Mohammed J. Zaki,et al.  TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data , 2005, SIGMOD '05.

[24]  Philip S. Yu,et al.  Clustering by pattern similarity in large data sets , 2002, SIGMOD '02.

[25]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[26]  G. Church,et al.  Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae , 2001, Nature Genetics.