Multiple non-collinear TF-map alignments of promoter regions

BackgroundThe analysis of the promoter sequence of genes with similar expression patterns is a basic tool to annotate common regulatory elements. Multiple sequence alignments are on the basis of most comparative approaches. The characterization of regulatory regions from co-expressed genes at the sequence level, however, does not yield satisfactory results in many occasions as promoter regions of genes sharing similar expression programs often do not show nucleotide sequence conservation.ResultsIn a recent approach to circumvent this limitation, we proposed to align the maps of predicted transcription factors (referred as TF-maps) instead of the nucleotide sequence of two related promoters, taking into account the label of the corresponding factor and the position in the primary sequence. We have now extended the basic algorithm to permit multiple promoter comparisons using the progressive alignment paradigm. In addition, non-collinear conservation blocks might now be identified in the resulting alignments. We have optimized the parameters of the algorithm in a small, but well-characterized collection of human-mouse-chicken-zebrafish orthologous gene promoters.ConclusionResults in this dataset indicate that TF-map alignments are able to detect high-level regulatory conservation at the promoter and the 3'UTR gene regions, which cannot be detected by the typical sequence alignments. Three particular examples are introduced here to illustrate the power of the multiple TF-map alignments to characterize conserved regulatory elements in absence of sequence similarity. We consider this kind of approach can be extremely useful in the future to annotate potential transcription factor binding sites on sets of co-regulated genes from high-throughput expression experiments.

[1]  Mark Gerstein,et al.  Distribution of NF-kappaB-binding sites across human chromosome 22. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Michael R. Brent,et al.  Using Multiple Alignments to Improve Gene Prediction , 2005, RECOMB.

[3]  J. Fak,et al.  Transcriptional Control in the Segmentation Gene Network of Drosophila , 2004, PLoS biology.

[4]  T. Strachan,et al.  Human Molecular Genetics 2 , 1997 .

[5]  Chuong B. Do,et al.  Access the most recent version at doi: 10.1101/gr.926603 References , 2003 .

[6]  F. Blattner,et al.  Mauve: multiple alignment of conserved genomic sequence with rearrangements. , 2004, Genome research.

[7]  J. Fickett,et al.  Identification of regulatory regions which confer muscle-specific gene expression. , 1998, Journal of molecular biology.

[8]  F. Robert,et al.  Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression , 2006 .

[9]  Michael S. Waterman,et al.  Algorithms for restriction map comparisons , 1984, Nucleic Acids Res..

[10]  Martin S. Taylor,et al.  Genome-wide analysis of mammalian promoter architecture and evolution , 2006, Nature Genetics.

[11]  David A. Nix,et al.  GATA: a graphic alignment tool for comparative sequence analysis , 2005, BMC Bioinformatics.

[12]  M. Conley,et al.  Transcriptional regulatory elements within the first intron of Bruton's tyrosine kinase. , 1998, Blood.

[13]  Michael Q. Zhang,et al.  Genome-wide promoter extraction and analysis in human, mouse, and rat , 2005, Genome Biology.

[14]  H. Ten Have,et al.  Open Access , 2021, Dictionary of Global Bioethics.

[15]  Enrique Blanco,et al.  ABS: a database of Annotated regulatory Binding Sites from orthologous promoters , 2005, Nucleic Acids Res..

[16]  A. Sandelin,et al.  Identification of conserved regulatory elements by comparative genome analysis , 2003, Journal of biology.

[17]  P. Sellers On the Theory and Computation of Evolutionary Distances , 1974 .

[18]  Mathieu Blanchette,et al.  FootPrinter: a program designed for phylogenetic footprinting , 2003, Nucleic Acids Res..

[19]  H. Prauser,et al.  Robert R. Sokal und Peter H. A. Sneath, Principles of Numerical Taxonomy 1. Aufl. XVI, 359 S., 38 Abb., 21 Tab. San Francisco and London 1963: W. H. Freeman and Company 60 s , 1966 .

[20]  Sudhir Kumar,et al.  Multiple sequence alignment: in pursuit of homologous DNA positions. , 2007, Genome research.

[21]  L. Pennacchio,et al.  Genomic strategies to identify mammalian regulatory sequences , 2001, Nature Reviews Genetics.

[22]  T. Werner Models for prediction and recognition of eukaryotic promoters , 1999, Mammalian Genome.

[23]  Matthew W. Hahn,et al.  The evolution of transcriptional regulation in eukaryotes. , 2003, Molecular biology and evolution.

[24]  Inna Dubchak,et al.  Glocal alignment: finding rearrangements during alignment , 2003, ISMB.

[25]  J. T. Kadonaga,et al.  *To whom correspondence should be addressed. E- , 2022 .

[26]  E. Ukkonen,et al.  Genome-wide Prediction of Mammalian Enhancers Based on Analysis of Transcription-Factor Binding Affinity , 2006, Cell.

[27]  R. Doolittle,et al.  Progressive sequence alignment as a prerequisitetto correct phylogenetic trees , 2007, Journal of Molecular Evolution.

[28]  Massimo Vergassola,et al.  Computational detection of genomic cis-regulatory modules applied to body patterning in the early Drosophila embryo , 2002, BMC Bioinformatics.

[29]  Roderic Guigó,et al.  Gff2ps: Visualizing Genomic Annotations , 2000, Bioinform..

[30]  H Frauenfelder,et al.  Myoglobin: The hydrogen atom of biology and a paradigm of complexity , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[31]  E. Davidson,et al.  The hardwiring of development: organization and function of genomic regulatory systems. , 1997, Development.

[32]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[33]  D. Brenner,et al.  Regulatory elements in the 5'-flanking region and the first intron contribute to transcriptional control of the mouse alpha 1 type I collagen gene , 1989, Molecular and cellular biology.

[34]  R. Bassel-Duby,et al.  A 40-kilodalton protein binds specifically to an upstream sequence element essential for muscle-specific transcription of the human myoglobin promoter , 1992, Molecular and cellular biology.

[35]  Rappold,et al.  Human Molecular Genetics , 1996, Nature Medicine.

[36]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[37]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[38]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[39]  K. Lindblad-Toh,et al.  Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals , 2005, Nature.

[40]  A. Sandelin,et al.  Applied bioinformatics for the identification of regulatory elements , 2004, Nature Reviews Genetics.

[41]  Tatiana A. Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[42]  H. Blau,et al.  alpha-skeletal and alpha-cardiac actin genes are coexpressed in adult human skeletal muscle and heart , 1983, Molecular and cellular biology.

[43]  Mario Huerta,et al.  Identification of patterns in biological sequences at the ALGGEN server: PROMO and MALGEN , 2003, Nucleic Acids Res..

[44]  Martha L Bulyk,et al.  DNA microarray technologies for measuring protein-DNA interactions. , 2006, Current opinion in biotechnology.

[45]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[46]  X. Huang,et al.  Dynamic programming algorithms for restriction map comparison , 1992, Comput. Appl. Biosci..

[47]  Steven J. M. Jones,et al.  Locating mammalian transcription factor binding sites: a survey of computational and experimental techniques. , 2006, Genome research.

[48]  Jon D. McAuliffe,et al.  Phylogenetic Shadowing of Primate Sequences to Find Functional Regions of the Human Genome , 2003, Science.

[49]  Peter H. A. Sneath,et al.  Numerical Taxonomy: The Principles and Practice of Numerical Classification , 1973 .

[50]  W. Wasserman,et al.  A predictive model for regulatory sequences directing liver-specific transcription. , 2001, Genome research.

[51]  N. Patel,et al.  Functional analysis of eve stripe 2 enhancer evolution in Drosophila: rules governing conservation and change. , 1998, Development.

[52]  Enrique Blanco,et al.  Transcription Factor Map Alignment of Promoter Regions , 2006, PLoS Comput. Biol..

[53]  E. Buratti,et al.  Defective splicing, disease and therapy: searching for master checkpoints in exon definition , 2006, Nucleic acids research.

[54]  Wyeth W. Wasserman,et al.  A new generation of JASPAR, the open-access repository for transcription factor binding site profiles , 2005, Nucleic Acids Res..

[55]  W. A. Beyer,et al.  Some Biological Sequence Metrics , 1976 .

[56]  J. Hoheisel Microarray technology: beyond transcript profiling and genotype analysis , 2006, Nature Reviews Microbiology.

[57]  A. Michelson Deciphering genetic regulatory codes: A challenge for functional genomics , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[58]  Thomas E. Royce,et al.  Distribution of NF-κB-binding sites across human chromosome 22 , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[59]  C. Rawlings,et al.  Identification and analysis of multigene families by comparison of exon fingerprints. , 1995, Journal of molecular biology.

[60]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[61]  Alexander E. Kel,et al.  TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes , 2005, Nucleic Acids Res..

[62]  Anna G. Nazina,et al.  Extraction of functional binding sites from unique regulatory regions: the Drosophila early developmental enhancers. , 2002, Genome research.