Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions.

The complexity of biological systems provides for a great diversity of relationships between genes. The current analysis of whole-genome expression data focuses on relationships based on global correlation over a whole time-course, identifying clusters of genes whose expression levels simultaneously rise and fall. There are, of course, other potential relationships between genes, which are missed by such global clustering. These include activation, where one expects a time-delay between related expression profiles, and inhibition, where one expects an inverted relationship. Here, we propose a new method, which we call local clustering, for identifying these time-delayed and inverted relationships. It is related to conventional gene-expression clustering in a fashion analogous to the way local sequence alignment (the Smith-Waterman algorithm) is derived from global alignment (Needleman-Wunsch). An integral part of our method is the use of random score distributions to assess the statistical significance of each cluster. We applied our method to the yeast cell-cycle expression dataset and were able to detect a considerable number of additional biological relationships between genes, beyond those resulting from conventional correlation. We related these new relationships between genes to their similarity in function (as determined from the MIPS scheme) or their having known protein-protein interactions (as determined from the large-scale two-hybrid experiment); we found that genes strongly related by local clustering were considerably more likely than random to have a known interaction or a similar cellular role. This suggests that local clustering may be useful in functional annotation of uncharacterized genes. We examined many of the new relationships in detail. Some of them were already well-documented examples of inhibition or activation, which provide corroboration for our results. For instance, we found an inverted expression profile relationship between genes YME1 and YNT20, where the latter has been experimentally documented as a bypass suppressor of the former. We also found new relationships involving uncharacterized yeast genes and were able to suggest functions for many of them. In particular, we found a time-delayed expression relationship between J0544 (which has not yet been functionally characterized) and four genes associated with the mitochondria. This suggests that J0544 may be involved in the control or activation of mitochondrial genes. We have also looked at other, less extensive datasets than the yeast cell-cycle and found further interesting relationships. Our clustering program and a detailed website of clustering results is available at http://www.bioinfo.mbb.yale.edu/expression/cluster (or http://www.genecensus.org/expression/cluster).

[1]  Inhibition of yeast 1 -pyrroline-5-carboxylate dehydrogenase by common amino acids and the regulation of proline catabolism. , 1973, Biochimica et biophysica acta.

[2]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[3]  M. Brandriss Proline utilization in Saccharomyces cerevisiae: analysis of the cloned PUT2 gene , 1983, Molecular and cellular biology.

[4]  M. Brandriss,et al.  Primary structure of the nuclear PUT2 gene involved in the mitochondrial pathway for proline utilization in Saccharomyces cerevisiae. , 1984, Molecular and cellular biology.

[5]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[6]  S. Ackerman,et al.  Identification of two nuclear genes (ATP11, ATP12) required for assembly of the yeast F1-ATPase. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[7]  P. Brown,et al.  A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. , 1996, Genome research.

[8]  D. Lockhart,et al.  Expression monitoring by hybridization to high-density oligonucleotide arrays , 1996, Nature Biotechnology.

[9]  Gapped BLAST and PSI-BLAST: A new , 1997 .

[10]  C. Hollenberg,et al.  The molecular genetics of hexose transport in yeasts. , 1997, FEMS microbiology reviews.

[11]  K. Isono,et al.  Identification and characterization of the genes for mitochondrial ribosomal proteins of Saccharomyces cerevisiae. , 1997, European journal of biochemistry.

[12]  M. Holcombe,et al.  Information Processing in Cells and Tissues , 1998, Springer US.

[13]  J. Barker,et al.  Large-scale temporal gene expression mapping of central nervous system development. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[14]  M. Bittner,et al.  Data management and analysis for gene expression arrays , 1998, Nature Genetics.

[15]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[16]  M Levitt,et al.  Comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins , 1998, Protein science : a publication of the Protein Society.

[17]  Dmitrij Frishman,et al.  MIPS: a database for protein sequences and complete genomes , 1998, Nucleic Acids Res..

[18]  M. Levitt,et al.  A unified statistical framework for sequence comparison and structure comparison. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[19]  W. Pearson Empirical statistical estimates for sequence similarity searches. , 1998, Journal of molecular biology.

[20]  P. D’haeseleer,et al.  Mining the gene expression matrix: inferring gene relationships from large scale gene expression data , 1998 .

[21]  M. Riley Systems for categorizing functions of gene products. , 1998, Current Opinion in Structural Biology.

[22]  M. Gerstein Patterns of protein‐fold usage in eight microbial genomes: A comprehensive structural census , 1998, Proteins.

[23]  James I. Garrels,et al.  The Yeast Proteome Database (YPD): a model for the organization and presentation of genome-wide functional data , 1999, Nucleic Acids Res..

[24]  P. Törönen,et al.  Analysis of gene expression data using self‐organizing maps , 1999, FEBS letters.

[25]  Scott A. Rifkin,et al.  Microarray analysis of Drosophila development during metamorphosis. , 1999, Science.

[26]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[27]  D. Eisenberg,et al.  A combined algorithm for genome-wide prediction of protein function , 1999, Nature.

[28]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[29]  C. Niehrs,et al.  Synexpression groups in eukaryotes , 1999, Nature.

[30]  T. Hanekamp,et al.  YNT20, a bypass suppressor of yme1 yme2, encodes a putative 3′-5′ exonuclease localized in mitochondria of Saccharomyces cerevisiae , 1999, Current Genetics.

[31]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Laurie J. Heyer,et al.  Exploring expression data: identification and analysis of coexpressed genes. , 1999, Genome research.

[33]  K Sivakumar,et al.  General nonlinear framework for the analysis of gene interaction via multivariate expression arrays. , 2000, Journal of biomedical optics.

[34]  Nathan Linial,et al.  ProtoMap: automatic classification of protein sequences and hierarchy of protein families , 2000, Nucleic Acids Res..

[35]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[36]  T. Gaasterland,et al.  Making the most of microarray data , 2000, Nature Genetics.

[37]  M. Gerstein,et al.  Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. , 2000, Journal of molecular biology.

[38]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[39]  M Gerstein,et al.  Analysis of the yeast transcriptome with structural and functional categories: characterizing highly expressed proteins. , 2000, Nucleic acids research.

[40]  E. Brown,et al.  Genomic analysis of gene expression in C. elegans. , 2000, Science.

[41]  D. Botstein,et al.  Two yeast forkhead genes regulate the cell cycle and pseudohyphal growth , 2000, Nature.

[42]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[43]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[44]  M. Gerstein,et al.  A Bayesian system integrating expression data with sequence patterns for localizing proteins: comprehensive application to the yeast genome. , 2000, Journal of molecular biology.

[45]  T. Ito,et al.  Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[46]  H. Riezman,et al.  Functional interactions between the p35 subunit of the Arp2/3 complex and calmodulin in yeast. , 2000, Molecular biology of the cell.

[47]  M. Gerstein,et al.  The current excitement in bioinformatics-analysis of whole-genome expression data: how does it relate to protein structure and function? , 2000, Current opinion in structural biology.

[48]  Kara Dolinski,et al.  Integrating functional genomic information into the Saccharomyces Genome Database , 2000, Nucleic Acids Res..

[49]  M. Gerstein Integrative database analysis in structural genomics , 2000, Nature Structural Biology.

[50]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[51]  H. Riezman,et al.  Saccharomyces cerevisiae Arc35p works through two genetically separable calmodulin functions to regulate the actin and tubulin cytoskeletons. , 2000, Journal of cell science.

[52]  N. Lee,et al.  A concise guide to cDNA microarray analysis. , 2000, BioTechniques.

[53]  M Gerstein,et al.  Genome-wide analysis relating expression level with protein subcellular localization. , 2000, Trends in genetics : TIG.

[54]  M. Gerstein,et al.  Digging for dead genes: an analysis of the characteristics of the pseudogene population in the Caenorhabditis elegans genome. , 2001, Nucleic acids research.

[55]  H. Bussemaker,et al.  Regulatory element detection using correlation with expression , 2001, Nature Genetics.

[56]  Jong H. Park,et al.  Mapping protein family interactions: intramolecular and intermolecular protein family interaction repertoires in the PDB and yeast. , 2001, Journal of molecular biology.

[57]  R. Altman,et al.  Whole-genome expression analysis: challenges beyond clustering. , 2001, Current opinion in structural biology.

[58]  M. Gerstein,et al.  Relating whole-genome expression data with protein-protein interactions. , 2002, Genome research.

[59]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .