Hypothesis-driven approach to predict transcriptional units from gene expression data

MOTIVATION A major issue in computational biology is the reconstruction of functional relationships among genes, for example the definition of regulatory or biochemical pathways. One step towards this aim is the elucidation of transcriptional units, which are characterized by co-responding changes in mRNA expression levels. These units of genes will allow the generation of hypotheses about respective functional interrelationships. Thus, the focus of analysis currently moves from well-established functional assignment through comparison of protein and DNA sequences towards analysis of transcriptional co-response. Tools that allow deducing common control of gene expression have the potential to complement and extend routine BLAST comparisons, because gene function may be inferred from common transcriptional control. RESULTS We present a co-clustering strategy of genome sequence information and gene expression data, which was applied to identify transcriptional units within diverse compendia of expression profiles. The phenomenon of prokaryotic operons was selected as an ideal test case to generate well-founded hypotheses about transcriptional units. The existence of overlapping and ambiguous operon definitions allowed the investigation of constitutive and conditional expression of transcriptional units in independent gene expression experiments of Escherichia coli. Our approach allowed identification of operons with high accuracy. Furthermore, both constitutive mRNA co-response as well as conditional differences became apparent. Thus, we were able to generate insight into the possible biological relevance of gene co-response. We conclude that the suggested strategy will be amenable in general to the identification of transcriptional units beyond the chosen example of E.coli operons. AVAILABILITY The analyses of E.coli transcript data presented here are available upon request or at http://csbdb.mpimp-golm.mpg.de/

[1]  D. Botstein,et al.  DNA microarray analysis of gene expression in response to physiological and genetic changes that affect tryptophan metabolism in Escherichia coli. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[2]  N. W. Davis,et al.  The complete genome sequence of Escherichia coli K-12. , 1997, Science.

[3]  Yoshihiro Yamanishi,et al.  Extraction of correlated gene clusters from multiple genomic data by generalized kernel canonical correlation analysis , 2003, ISMB.

[4]  F. James Rohlf,et al.  Biometry: The Principles and Practice of Statistics in Biological Research , 1969 .

[5]  Julio Collado-Vides,et al.  RegulonDB (version 3.2): transcriptional regulation and operon organization in Escherichia coli K-12 , 2001, Nucleic Acids Res..

[6]  Michael K. Gilson,et al.  ASAP, a systematic annotation package for community analysis of genomes , 2003, Nucleic Acids Res..

[7]  Chiara Sabatti,et al.  Co-expression pattern from DNA microarray experiments as a tool for operon prediction , 2002, Nucleic Acids Res..

[8]  Julio Collado-Vides,et al.  A powerful non-homology method for the prediction of operons in prokaryotes , 2002, ISMB.

[9]  J. Guest,et al.  Transcription and transcript processing in the sdhCDAB-sucABCD operon of Escherichia coli. , 1998, Microbiology.

[10]  Arkady B. Khodursky,et al.  Global analysis of mRNA decay and abundance in Escherichia coli at single-gene resolution using two-color fluorescent DNA microarrays , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Yu Qiu,et al.  Predicting bacterial transcription units using sequence and expression data , 2003, ISMB.

[12]  J. Courcelle,et al.  Comparative gene expression profiles following UV exposure in wild-type and SOS-deficient Escherichia coli. , 2001, Genetics.

[13]  David R. Haynor,et al.  Identifying operons and untranslated regions of transcripts using Escherichia coli RNA expression analysis , 2002, ISMB.

[14]  S. Salzberg,et al.  Prediction of operons in microbial genomes. , 2001, Nucleic acids research.

[15]  L. Baringhaus,et al.  On a new multivariate two-sample test , 2004 .

[16]  Peter D. Karp,et al.  The EcoCyc Database , 2002, Nucleic Acids Res..

[17]  J. Szustakowski,et al.  Computational identification of operons in microbial genomes. , 2002, Genome research.

[18]  T Yada,et al.  Modeling and predicting transcriptional units of Escherichia coli genes using hidden Markov models. , 1999, Bioinformatics.

[19]  N. W. Davis,et al.  Genome sequence of enterohaemorrhagic Escherichia coli O157:H7 , 2001, Nature.

[20]  Albert-László Barabási,et al.  Life's Complexity Pyramid , 2002, Science.

[21]  David Botstein,et al.  The Stanford Microarray Database , 2001, Nucleic Acids Res..

[22]  David Page,et al.  A Bayesian Network Approach to Operon Prediction , 2003, Bioinform..

[23]  Shirley M. Tilghman,et al.  Exploring genome space , 2000, Nature.

[24]  Daniel Hanisch,et al.  Co-clustering of biological networks and gene expression data , 2002, ISMB.

[25]  Jeremy D. Glasner,et al.  Genome-Scale Analysis of the Uses of the Escherichia coli Genome: Model-Driven Analysis of Heterogeneous Data Sets , 2003, Journal of bacteriology.

[26]  Boris Mirkin,et al.  Mathematical Classification and Clustering , 1996 .

[27]  A. Barabasi,et al.  Hierarchical Organization of Modularity in Metabolic Networks , 2002, Science.