Biclustering of gene expression data

Biclustering is an important problem that arises in diverse applications, including the analysis of gene expression and drug interaction data. A large number of clustering approaches have been proposed for gene expression data obtained from microarray experiments. However, the results from the application of standard clustering methods to genes are limited. This limitation is imposed by the existence of a number of experimental conditions or gene samples, where the expression levels of the same genes are uncorrelated. A similar limitation exists when condition-clustering is performed. The goal of biclustering is to find submatrices of genes and conditions, or samples where the genes have nearly the same expression levels for nearly all conditions. Some clustering methods have been adopted or proposed. However, some concerns still remain, such as the robustness of mining methods on the noise and input parameters. In this paper we tackle the problem of effectively clustering gene expression data by proposing an algorithm. We use a density-based approach to identify clusters. Our experimental results show that the algorithm is effective.

[1]  Federico Divina,et al.  Evolutionary computation for biclustering of gene expression , 2005, SAC '05.

[2]  Eckart Zitzler,et al.  An EA framework for biclustering of gene expression data , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[3]  Wei Wang,et al.  OP-cluster: clustering by tendency in high dimensional space , 2003, Third IEEE International Conference on Data Mining.

[4]  René Peeters,et al.  The maximum edge biclique problem is NP-complete , 2003, Discret. Appl. Math..

[5]  Bart De Moor,et al.  Biclustering microarray data by Gibbs sampling , 2003, ECCB.

[6]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[7]  Sandrine Dudoit,et al.  Bagging to Improve the Accuracy of A Clustering Procedure , 2003, Bioinform..

[8]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[9]  Philip S. Yu,et al.  Enhanced biclustering on expression data , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[10]  T. M. Murali,et al.  Extracting Conserved Gene Expression Motifs from Gene Expression Data , 2002, Pacific Symposium on Biocomputing.

[11]  Daphne Koller,et al.  Decomposing Gene Expression into Cellular Processes , 2002, Pacific Symposium on Biocomputing.

[12]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem. , 2003 .

[13]  Werner Dubitzky,et al.  A Practical Approach to Microarray Data Analysis , 2003, Springer US.

[14]  Philip S. Yu,et al.  /spl delta/-clusters: capturing subspace correlation in a large data set , 2002, Proceedings 18th International Conference on Data Engineering.

[15]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[16]  Philip S. Yu,et al.  Clustering by pattern similarity in large data sets , 2002, SIGMOD '02.

[17]  Pavel Berkhin,et al.  Learning Simple Relations: Theory and Applications , 2002, SDM.

[18]  Adrian E. Raftery,et al.  Model-based clustering and data transformations for gene expression data , 2001, Bioinform..

[19]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[20]  Ben Taskar,et al.  Rich probabilistic models for gene expression , 2001, ISMB.

[21]  Aidong Zhang,et al.  Interrelated two-way clustering: an unsupervised approach for gene expression data analysis , 2001, Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001).

[22]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[23]  Andrea Califano,et al.  Analysis of Gene Expression Microarrays for Phenotype Classification , 2000, ISMB.

[24]  Ulrich Güntzer,et al.  Algorithms for association rule mining — a general survey and comparison , 2000, SKDD.

[25]  G. Getz,et al.  Coupled two-way clustering analysis of gene microarray data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[26]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[27]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[28]  Bruno R. Preiss,et al.  Data Structures and Algorithms with Object-Oriented Design Patterns in Java , 1999 .

[29]  Thomas Hofmann,et al.  Latent Class Models for Collaborative Filtering , 1999, IJCAI.

[30]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[31]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[32]  E. Lander Array of hope , 1999, Nature Genetics.

[33]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[35]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[36]  Joachim M. Buhmann,et al.  Active Data Clustering , 1997, NIPS.

[37]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[38]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[39]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[40]  D. Duffy,et al.  A permutation-based algorithm for block clustering , 1991 .

[41]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[42]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[43]  David S. Johnson,et al.  The NP-Completeness Column: An Ongoing Guide , 1982, J. Algorithms.

[44]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[45]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.