Co-clustering of biological networks and gene expression data

MOTIVATION Large scale gene expression data are often analysed by clustering genes based on gene expression data alone, though a priori knowledge in the form of biological networks is available. The use of this additional information promises to improve exploratory analysis considerably. RESULTS We propose constructing a distance function which combines information from expression data and biological networks. Based on this function, we compute a joint clustering of genes and vertices of the network. This general approach is elaborated for metabolic networks. We define a graph distance function on such networks and combine it with a correlation-based distance function for gene expression measurements. A hierarchical clustering and an associated statistical measure is computed to arrive at a reasonable number of clusters. Our method is validated using expression data of the yeast diauxic shift. The resulting clusters are easily interpretable in terms of the biochemical network and the gene expression data and suggest that our method is able to automatically identify processes that are relevant under the measured conditions.

[1]  D. Fell,et al.  The small world of metabolism , 2000, Nature Biotechnology.

[2]  Tommi S. Jaakkola,et al.  Using Graphical Models and Genomic Expression Data to Statistically Validate Models of Genetic Regulatory Networks , 2000, Pacific Symposium on Biocomputing.

[3]  Hans-Werner Mewes,et al.  MIPS: a database for protein sequences, homology data and yeast genome information , 1997, Nucleic Acids Res..

[4]  U. Bhalla,et al.  Emergent properties of networks of biological signaling pathways. , 1999, Science.

[5]  Thomas Lengauer,et al.  Analysis of Gene Expression Data with Pathway Scores , 2000, ISMB.

[6]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[7]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[8]  M K Kerr,et al.  Bootstrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Minoru Kanehisa,et al.  Toward Pathway Engineering: A New Database of Genetic and Molecular Pathways , 1997 .

[10]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[11]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[12]  Sudeshna Adak,et al.  Genome-Wide Pathway Analysis and Visualization Using Gene Expression Data , 2001, Pacific Symposium on Biocomputing.

[13]  Thomas Lengauer,et al.  Pathway analysis in metabolic databases via differetial metabolic display (DMD) , 2000, German Conference on Bioinformatics.

[14]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[15]  Michael Schroeder,et al.  Application of Regulatory Sequence Analysis and Metabolic Network Analysis to the Interpretation of Gene Expression Data , 2000, JOBIM.