Conservation patterns in different functional sequence categories of divergent Drosophila species.

We have explored the distributions of fully conserved ungapped blocks in genome-wide pair-wise alignments of recently completed species of Drosophila: D. melanogaster, D. yakuba, D. ananassae, D. pseudoobscura, D. virilis, and D. mojavensis. Based on these distributions we have found that nearly every functional sequence category possesses its own distinctive conservation pattern, sometimes independent of the overall sequence conservation level. In the coding and regulatory regions, the ungapped blocks were longer than in introns, UTRs, and nonfunctional sequences. At the same time, the blocks in the coding regions carried a 3N + 2 signature characteristic of synonymous substitutions in the third-codon position. Larger block sizes in transcription regulatory regions can be explained by the presence of conserved arrays of binding sites for transcription factors. We also have shown that the longest ungapped blocks, or "ultraconserved" sequences, are associated with specific gene groups, including those encoding ion channels and components of the cytoskeleton. We discuss how restraining conservation patterns may help in mapping functional sequence categories and improve genome annotation.

[1]  E. Davidson,et al.  The hardwiring of development: organization and function of genomic regulatory systems. , 1997, Development.

[2]  Inna Dubchak,et al.  Glocal alignment: finding rearrangements during alignment , 2003, ISMB.

[3]  Eugene V Koonin,et al.  Comparative analysis of orthologous eukaryotic mRNAs: potential hidden functional signals. , 2004, Nucleic acids research.

[4]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[5]  J. T. Kadonaga,et al.  The Downstream Promoter Element DPE Appears To Be as Widely Used as the TATA Box in Drosophila Core Promoters , 2000, Molecular and Cellular Biology.

[6]  G. Rubin,et al.  Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[7]  M. Kreitman,et al.  Analysis of conserved noncoding DNA in Drosophila reveals similar constraints in intergenic and intronic sequences. , 2001, Genome research.

[8]  Eldon Emberly,et al.  Conservation of regulatory elements between two species of Drosophila , 2003, BMC Bioinformatics.

[9]  Anna G. Nazina,et al.  Distance preferences in the arrangement of binding motifs and hierarchical levels in organization of transcription regulatory information. , 2003, Nucleic acids research.

[10]  Simon Whelan,et al.  Statistical Methods in Molecular Evolution , 2005 .

[11]  D. Haussler,et al.  Article Identification and Characterization of Multi-Species Conserved Sequences , 2022 .

[12]  A. Gnirke,et al.  Assessing the impact of comparative genomic sequence data on the functional annotation of the Drosophila genome , 2002, Genome Biology.

[13]  Michael Ashburner,et al.  Drosophila melanogaster: a case study of a model genomic sequence and its consequences. , 2005, Genome research.

[14]  Mei Li,et al.  MultiPipMaker and supporting tools: alignments and analysis of multiple genomic DNA sequences , 2003, Nucleic Acids Res..

[15]  Jens Stoye,et al.  Benchmarking tools for the alignment of functional noncoding DNA , 2004, BMC Bioinformatics.

[16]  Dmitri Papatsenko,et al.  Computational identification of regulatory DNAs underlying animal development , 2005, Nature Methods.

[17]  Anna G. Nazina,et al.  Homotypic regulatory clusters in Drosophila. , 2003, Genome research.

[18]  A. Clark,et al.  Tracing the evolutionary history of Drosophila regulatory regions with models that identify transcription factor binding sites. , 2003, Molecular biology and evolution.

[19]  Gill Bejerano,et al.  Ultraconserved elements in insect genomes: a highly conserved intronic sequence implicated in the control of homothorax mRNA splicing. , 2005, Genome research.

[20]  Michael Ashburner,et al.  Annotation of the Drosophila melanogaster euchromatic genome: a systematic review , 2002, Genome Biology.

[21]  L. Pennacchio,et al.  Genomic strategies to identify mammalian regulatory sequences , 2001, Nature Reviews Genetics.

[22]  Kristin C. Gunsalus,et al.  microRNA Target Predictions across Seven Drosophila Species and Comparison to Mammalian Targets , 2005, PLoS Comput. Biol..

[23]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[24]  Sam Griffiths-Jones,et al.  The microRNA Registry , 2004, Nucleic Acids Res..

[25]  Chuong B. Do,et al.  Access the most recent version at doi: 10.1101/gr.926603 References , 2003 .

[26]  Nicholas L. Bray,et al.  AVID: A global alignment program. , 2003, Genome research.

[27]  R. Gibbs,et al.  PipMaker--a web server for aligning two genomic DNA sequences. , 2000, Genome research.

[28]  D. Haussler,et al.  Ultraconserved Elements in the Human Genome , 2004, Science.

[29]  David Haussler,et al.  Combining Phylogenetic and Hidden Markov Models in Biosequence Analysis , 2004, J. Comput. Biol..

[30]  Lior Pachter,et al.  VISTA: computational tools for comparative genomics , 2004, Nucleic Acids Res..

[31]  Long Li,et al.  REDfly: a Regulatory Element Database for Drosophila , 2006, Bioinform..

[32]  Dmitri A. Papatsenko,et al.  Statistical extraction of Drosophila cis-regulatory modules using exhaustive assessment of local word frequency , 2003, BMC Bioinformatics.

[33]  C. V. Jongeneel,et al.  Numerous potentially functional but non-genic conserved sequences on human chromosome 21 , 2002, Nature.

[34]  D. Haussler,et al.  Human-mouse alignments with BLASTZ. , 2003, Genome research.

[35]  Lior Pachter,et al.  VISTA : visualizing global DNA sequence alignments of arbitrary length , 2000, Bioinform..

[36]  S. Salzberg,et al.  Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura , 2004, Genome Biology.