Short sequence motifs, overrepresented in mammalian conserved non-coding sequences

BackgroundA substantial fraction of non-coding DNA sequences of multicellular eukaryotes is under selective constraint. In particular, ~5% of the human genome consists of conserved non-coding sequences (CNSs). CNSs differ from other genomic sequences in their nucleotide composition and must play important functional roles, which mostly remain obscure.ResultsWe investigated relative abundances of short sequence motifs in all human CNSs present in the human/mouse whole-genome alignments vs. three background sets of sequences: (i) weakly conserved or unconserved non-coding sequences (non-CNSs); (ii) near-promoter sequences (located between nucleotides -500 and -1500, relative to a start of transcription); and (iii) random sequences with the same nucleotide composition as that of CNSs. When compared to non-CNSs and near-promoter sequences, CNSs possess an excess of AT-rich motifs, often containing runs of identical nucleotides. In contrast, when compared to random sequences, CNSs contain an excess of GC-rich motifs which, however, lack CpG dinucleotides. Thus, abundance of short sequence motifs in human CNSs, taken as a whole, is mostly determined by their overall compositional properties and not by overrepresentation of any specific short motifs. These properties are: (i) high AT-content of CNSs, (ii) a tendency, probably due to context-dependent mutation, of A's and T's to clump, (iii) presence of short GC-rich regions, and (iv) avoidance of CpG contexts, due to their hypermutability. Only a small number of short motifs, overrepresented in all human CNSs are similar to binding sites of transcription factors from the FOX family.ConclusionHuman CNSs as a whole appear to be too broad a class of sequences to possess strong footprints of any short sequence-specific functions. Such footprints should be studied at the level of functional subclasses of CNSs, such as those which flank genes with a particular pattern of expression. Overall properties of CNSs are affected by patterns in mutation, suggesting that selection which causes their conservation is not always very strong.

[1]  Kimmen Sjölander,et al.  Phylogenetic Inference in Protein Superfamilies: Analysis of SH2 Domains , 1998, ISMB.

[2]  L. Lim,et al.  The winged helix transcriptional activator HFH-8 is expressed in the mesoderm of the primitive streak stage of mouse embryos and its cellular derivatives , 1997, Mechanisms of Development.

[3]  A. Reymond,et al.  Conserved non-genic sequences — an unexpected feature of mammalian genomes , 2005, Nature Reviews Genetics.

[4]  Aleksey Y Ogurtsov,et al.  Selection in favor of nucleotides G and C diversifies evolution rates and levels of polymorphism at mammalian synonymous sites. , 2006, Journal of theoretical biology.

[5]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[6]  Alexander E. Kel,et al.  TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes , 2005, Nucleic Acids Res..

[7]  Michael Q. Zhang,et al.  Identifying tissue-selective transcription factor binding sites in vertebrate promoters. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[8]  D. Halligan,et al.  Ubiquitous selective constraints in the Drosophila genome revealed by a genome-wide interspecies comparison. , 2006, Genome research.

[9]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[10]  E. Batourina,et al.  Foxd1-dependent signals control cellularity in the renal capsule, a structure required for normal renal development , 2005, Development.

[11]  A. Ogurtsov,et al.  Selective constraint in intergenic regions of human and mouse genomes. , 2001, Trends in genetics : TIG.

[12]  P. Andolfatto Adaptive evolution of non-coding DNA in Drosophila , 2005, Nature.

[13]  Alexandre Reymond,et al.  Evolutionary Discrimination of Mammalian Conserved Non-Genic Sequences (CNGs) , 2003, Science.

[14]  Shyam Prabhakar,et al.  Close sequence comparisons are sufficient to identify human cis-regulatory elements. , 2005, Genome research.

[15]  Michael Q. Zhang,et al.  DNA motifs in human and mouse proximal promoters predict tissue-specific expression. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Lior Pachter,et al.  VISTA: computational tools for comparative genomics , 2004, Nucleic Acids Res..

[17]  E. Wingender [Classification of eukaryotic transcription factors]. , 1997, Molekuliarnaia biologiia.

[18]  Serafim Batzoglou,et al.  A suite of web-based programs to search for transcriptional regulatory motifs , 2004, Nucleic Acids Res..

[19]  Alexey S Kondrashov,et al.  Classification of common conserved sequences in mammalian intergenic regions. , 2002, Human molecular genetics.

[20]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[21]  R. Hardison Conserved noncoding sequences are reliable guides to regulatory elements. , 2000, Trends in genetics : TIG.

[22]  P. Green,et al.  Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[23]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[24]  S. Shabalina,et al.  Pattern of selective constraint in C. elegans and C. briggsae genomes. , 1999, Genetical research.

[25]  W. Schaffner,et al.  Identification of a novel lymphoid specific octamer binding protein (OTF‐2B) by proteolytic clipping bandshift assay (PCBA). , 1988, The EMBO journal.

[26]  H. Schöler,et al.  Octamer binding proteins confer transcriptional activity in early mouse embryogenesis. , 1989, The EMBO journal.

[27]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[28]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[29]  E. Lai,et al.  Essential role of stromal mesenchyme in kidney morphogenesis revealed by targeted disruption of Winged Helix transcription factor BF-2. , 1996, Genes & development.

[30]  D. Castrillon,et al.  Foxo Transcription Factors Blunt Cardiac Hypertrophy by Inhibiting Calcineurin Signaling , 2006, Circulation.

[31]  D. Haussler,et al.  Article Identification and Characterization of Multi-Species Conserved Sequences , 2022 .

[32]  Inna Dubchak,et al.  Glocal alignment: finding rearrangements during alignment , 2003, ISMB.

[33]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Klaudia Walter,et al.  Striking nucleotide frequency pattern at the borders of highly conserved vertebrate non-coding sequences. , 2005, Trends in genetics : TIG.

[35]  Edgar Wingender,et al.  Systematic DNA-binding domain classification of transcription factors. , 2004, Genome informatics. International Conference on Genome Informatics.

[36]  Martin Vingron,et al.  T-Reg Comparator: an analysis tool for the comparison of position weight matrices , 2005, Nucleic Acids Res..

[37]  Aleksey Y Ogurtsov,et al.  Distant conserved sequences flanking endothelial-specific promoters contain tissue-specific DNase-hypersensitive sites and over-represented motifs. , 2006, Human molecular genetics.

[38]  P. Carlsson,et al.  Foxf1 and Foxf2 control murine gut development by limiting mesenchymal Wnt signaling and promoting extracellular matrix production , 2006, Development.

[39]  Laurent Excoffier,et al.  Conserved noncoding sequences are selectively constrained and not mutation cold spots , 2006, Nature Genetics.