Biclustering of Expression Data

An efficient node-deletion algorithm is introduced to find submatrices in expression data that have low mean squared residue scores and it is shown to perform well in finding co-regulation patterns in yeast and human. This introduces "biclustering", or simultaneous clustering of both genes and conditions, to knowledge discovery from expression data. This approach overcomes some problems associated with traditional clustering methods, by allowing automatic discovery of similarity based on a subset of attributes, simultaneous clustering of genes and conditions, and overlapped grouping that provides a better representation for genes with multiple functions or regulated by many factors.

[1]  J. Morgan,et al.  Problems in the Analysis of Survey Data, and a Proposal , 1963 .

[2]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[3]  J. Orlin Contentment in graph theory: Covering graphs with cliques , 1977 .

[4]  Dana S. Nau,et al.  A mathematical analysis of human leukocyte antigen serology , 1978 .

[5]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[6]  M. Garey Johnson: computers and intractability: a guide to the theory of np- completeness (freeman , 1979 .

[7]  Mihalis Yannakakis,et al.  Node-Deletion Problems on Bipartite Graphs , 1981, SIAM J. Comput..

[8]  David S. Johnson The NP-Completeness Column: An Ongoing Guide , 1986, J. Algorithms.

[9]  Boris Mirkin,et al.  Mathematical Classification and Clustering , 1996 .

[10]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[11]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[12]  Ron Shamir,et al.  An algorithm for clustering cDNAs for gene expression analysis , 1999, RECOMB.

[13]  Ash A. Alizadeh,et al.  Di erent types of di use large b-cell lymphoma identi ed by gene expression pro ling , 2000 .

[14]  G. Church,et al.  Systematic management and analysis of yeast gene expression data. , 2000, Genome research.

[15]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[16]  Kara Dolinski,et al.  Integrating functional genomic information into the Saccharomyces Genome Database , 2000, Nucleic Acids Res..