Cluster analysis and promoter modelling as bioinformatics tools for the identification of target genes from expression array data.

Expression arrays yield enormous amounts of data linking genes, via their cDNA sequences, to gene expression patterns. This now allows the characterisation of gene expression in normal and diseased tissues, as well as the response of tissues to the application of therapeutic reagents. Expression array data can be analysed with respect to the underlying protein sequences, which facilitates the precise determination of when and where certain groups of genes are expressed. More recent developments of clustering algorithms take additional parameters of the experimental set-up into account, focusing more directly on co-regulated set of genes. However, the information concerning transcriptional regulatory networks responsible for the observed expression patterns is not contained within the cDNA sequences used to generate the arrays. Regulation of expression is determined to a large extent by the promoter sequences of the individual genes (and/or enhancers). The complete sequence of the human genome now provides the molecular basis for the identification of many regulatory regions. Promoter sequences for specific cDNAs can be obtained reliably from genomic sequences by exon mapping. In the many cases in which cDNAs are 5'-incomplete, high quality promoter prediction tools can be used to locate promoters directly in the genomic sequence. Once sufficient numbers of promoter sequences have been obtained, a comparative promoter analysis of the co-regulated genes and groups of genes can be applied in order to generate models describing the higher order levels of transcription factor binding site organisation within these promoter regions. Such modules represent the molecular mechanisms through which regulatory networks influence gene expression, and candidates can be determined solely by bioinformatics. This approach also provides a powerful alternative for elucidating the functional features of genes with no detectable sequence similarity, by linking them to other genes on the basis of their common promoter structures.

[1]  M. T. Brewer,et al.  Interleukin 1 receptor antagonist is a member of the interleukin 1 gene family: evolution of a cytokine control mechanism. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[2]  T. Werner,et al.  MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. , 1995, Nucleic acids research.

[3]  K Frech,et al.  Common modular structure of lentivirus LTRs. , 1996, Virology.

[4]  J. Fickett Coordinate positioning of MEF2 and myogenin binding sites. , 1996, Gene.

[5]  J. Fickett,et al.  Eukaryotic promoter recognition. , 1997, Genome research.

[6]  E. Davidson,et al.  The hardwiring of development: organization and function of genomic regulatory systems. , 1997, Development.

[7]  R. Tjian,et al.  Mechanisms of transcriptional activation: differences and similarities between yeast, Drosophila, and man. , 1997, Current opinion in genetics & development.

[8]  E. Olson,et al.  Modular regulation of muscle gene transcription: a mechanism for muscle cell diversity. , 1997, Trends in genetics : TIG.

[9]  T. Werner,et al.  A novel method to develop highly specific models for regulatory units detects a new LTR in GenBank which contains a functional promoter. , 1997, Journal of molecular biology.

[10]  J. Fickett,et al.  Identification of regulatory regions which confer muscle-specific gene expression. , 1998, Journal of molecular biology.

[11]  J. Barker,et al.  Large-scale temporal gene expression mapping of central nervous system development. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[12]  I. Jonassen,et al.  Predicting gene regulatory elements in silico on a genomic scale. , 1998, Genome research.

[13]  L. Wakefield,et al.  Identification of the start sites for the 1.9- and 1.4-kb rat transforming growth factor-beta1 transcripts and their effect on translational efficiency. , 1998, Gene.

[14]  T. Heinemeyer,et al.  Databases on transcriptional regulation: TRANSFAC, TRRD and COMPEL , 1998, Nucleic Acids Res..

[15]  J. Stavnezer,et al.  Interaction of Stat6 and NF-κB: Direct Association and Synergistic Activation of Interleukin-4-Induced Transcription , 1998, Molecular and Cellular Biology.

[16]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Thomas Werner,et al.  Muscle actin genes: A first step towards computational classification of tissue specific promoters , 1998, Silico Biol..

[18]  J. Collado-Vides,et al.  Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. , 1998, Journal of molecular biology.

[19]  David R. Gilbert,et al.  Approaches to the Automatic Discovery of Patterns in Biosequences , 1998, J. Comput. Biol..

[20]  S. O’Brien,et al.  The promise of comparative genomics in mammals. , 1999, Science.

[21]  Melanie E. Goward,et al.  The DNA sequence of human chromosome 22 , 1999, Nature.

[22]  M. Clark,et al.  Comparative genomics: the key to understanding the human genome project , 1999, BioEssays : news and reviews in molecular, cellular and developmental biology.

[23]  E. Wingender,et al.  Recognition of NFATp/AP-1 composite elements within genes induced upon the activation of immune cells. , 1999, Journal of molecular biology.

[24]  T. Werner Models for prediction and recognition of eukaryotic promoters , 1999, Mammalian Genome.

[25]  Thomas Werner,et al.  Functional promoter modules can be detected by formal models independent of overall nucleotide sequence similarity , 1999, Bioinform..

[26]  J. Graves,et al.  The promise of comparative genomics in mammals : Genome , 1999 .

[27]  Thomas Werner,et al.  Regulatory modules shared within gene classes as well as across gene classes can be detected by the same in silico approach , 2000, Silico Biol..

[28]  Hans-Werner Mewes,et al.  Integrative Analysis of Protein Interaction Data , 2000, ISMB.

[29]  Roded Sharan,et al.  Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis , 2000, ISMB.

[30]  Saurabh Sinha,et al.  A Statistical Method for Finding Transcription Factor Binding Sites , 2000, ISMB.

[31]  Alexander E. Kel,et al.  COMPEL: a database on composite regulatory elements providing combinatorial transcriptional regulation , 2000, Nucleic Acids Res..

[32]  T. Werner,et al.  Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: a novel context analysis approach. , 2000, Journal of molecular biology.

[33]  Christian A. Rees,et al.  Systematic variation in gene expression patterns in human cancer cell lines , 2000, Nature Genetics.

[34]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[35]  M. Hattori,et al.  The DNA sequence of human chromosome 21 , 2000, Nature.

[36]  Ian Holmes,et al.  Finding Regulatory Elements Using Joint Likelihoods for Sequence and Expression Profile Data , 2000, ISMB.

[37]  Pierre Baldi,et al.  Analysis of Yeast's ORF Upstream Regions by Parallel Processing, Microarrays, and Computational Methods , 2000, ISMB.

[38]  Valérie Gailus-Durner,et al.  Experimental data of a single promoter can be used for in silico detection of genes with related regulation in the absence of sequence similarity , 2001, Mammalian Genome.

[39]  D. Steiner,et al.  Expression profiling of pancreatic beta-cells: glucose regulation of secretory and metabolic pathway genes. , 2000, Diabetes.