Assessing the impact of comparative genomic sequence data on the functional annotation of the Drosophila genome

BackgroundIt is widely accepted that comparative sequence data can aid the functional annotation of genome sequences; however, the most informative species and features of genome evolution for comparison remain to be determined.ResultsWe analyzed conservation in eight genomic regions (apterous, even-skipped, fushi tarazu, twist, and Rhodopsins 1, 2, 3 and 4) from four Drosophila species (D. erecta, D. pseudoobscura, D. willistoni, and D. littoralis) covering more than 500 kb of the D. melanogaster genome. All D. melanogaster genes (and 78-82% of coding exons) identified in divergent species such as D. pseudoobscura show evidence of functional constraint. Addition of a third species can reveal functional constraint in otherwise non-significant pairwise exon comparisons. Microsynteny is largely conserved, with rearrangement breakpoints, novel transposable element insertions, and gene transpositions occurring in similar numbers. Rates of amino-acid substitution are higher in uncharacterized genes relative to genes that have previously been studied. Conserved non-coding sequences (CNCSs) tend to be spatially clustered with conserved spacing between CNCSs, and clusters of CNCSs can be used to predict enhancer sequences.ConclusionsOur results provide the basis for choosing species whose genome sequences would be most useful in aiding the functional annotation of coding and cis-regulatory sequences in Drosophila. Furthermore, this work shows how decoding the spatial organization of conserved sequences, such as the clustering of CNCSs, can complement efforts to annotate eukaryotic genomes on the basis of sequence conservation alone.

[1]  D. Poulson,et al.  Evolution in the Genus Drosophila , 1954, The Yale Journal of Biology and Medicine.

[2]  R. Lewontin,et al.  A molecular approach to the study of genic heterozygosity in natural populations. II. Amount of variation and degree of heterozygosity in natural populations of Drosophila pseudoobscura. , 1966, Genetics.

[3]  C. Laird DNA of Drosophila chromosomes. , 1973, Annual review of genetics.

[4]  S. Wright,et al.  Dobzhansky's genetics of natural populations I-XLIII , 1981 .

[5]  G. Rubin,et al.  Genetic transformation of Drosophila with transposable element vectors. , 1982, Science.

[6]  M. Kreitman,et al.  Nucleotide polymorphism at the alcohol dehydrogenase locus of Drosophila melanogaster , 1983, Nature.

[7]  M Meselson,et al.  Interspecific nucleotide sequence comparisons used to identify regulatory and structural features of the Drosophila hsp82 gene. , 1986, Journal of molecular biology.

[8]  J. Hirsh,et al.  The Drosophila virilis dopa decarboxylase gene is developmentally regulated when integrated into Drosophila melanogaster. , 1986, The EMBO journal.

[9]  D C Shields,et al.  "Silent" sites in Drosophila genes are not neutral: evidence of selection among synonymous codons. , 1988, Molecular biology and evolution.

[10]  D. Maier,et al.  Regulation of the segmentation gene fushi tarazu has been functionally conserved in Drosophila. , 1990, The EMBO journal.

[11]  D. Labie,et al.  Molecular Evolution , 1991, Nature.

[12]  M. Levine,et al.  Regulation of a segmentation stripe by overlapping activators and repressors in the Drosophila embryo. , 1991, Science.

[13]  M. Levine,et al.  Autoregulation of a segmentation gene in Drosophila: combinatorial interaction of the even-skipped homeo box protein with a distal enhancer element. , 1991, Genes & development.

[14]  T. P. Neufeld,et al.  Evolution of gene position: chromosomal arrangement and sequence comparison of the Drosophila melanogaster and Drosophila virilis sina and Rh4 genes. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[15]  J. Powell,et al.  Drosophila Inversion Polymorphism , 1992 .

[16]  B. Birren,et al.  Stable propagation of cosmid sized human DNA inserts in an F factor based vector. , 1992, Nucleic acids research.

[17]  C. Pfeifle,et al.  apterous, a gene required for imaginal disc development in Drosophila encodes a member of the LIM family of developmental regulatory proteins. , 1992, Genes & development.

[18]  J. Powell,et al.  Evolution of the Adh locus in the Drosophila willistoni group: the loss of an intron, and shift in codon usage. , 1993, Molecular biology and evolution.

[19]  D. Hartl,et al.  Codon usage bias and base composition of nuclear genes in Drosophila. , 1993, Genetics.

[20]  M. Ashburner,et al.  The molecular evolution of the alcohol dehydrogenase and alcohol dehydrogenase-related genes in the Drosophila melanogaster species subgroup. , 1994, Molecular biology and evolution.

[21]  N. Dracopoli,et al.  Current protocols in human genetics , 1994 .

[22]  S. Cirera,et al.  Molecular characterization of the breakpoints of an inversion fixed between Drosophila melanogaster and D. subobscura. , 1995, Genetics.

[23]  M. Levine,et al.  Regulation of two pair-rule stripes by a single enhancer in the Drosophila embryo. , 1996, Developmental biology.

[24]  D. Petrov,et al.  High intrinsic rate of DNA loss in Drosophila , 1996, Nature.

[25]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[26]  Eugene V. Koonin,et al.  SEALS: A System for Easy Analysis of Lots of Sequences , 1997, ISMB.

[27]  Jeffrey R. Powell,et al.  Progress and Prospects in Evolutionary Biology: The Drosophila Model , 1997 .

[28]  K. Roeder,et al.  A statistical model for locating regulatory regions in genomic DNA. , 1997, Journal of molecular biology.

[29]  D. Tautz,et al.  A screen for fast evolving genes from Drosophila. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[30]  P. Barsanti,et al.  Intra- and interspecies variation among Bari-1 elements of the melanogaster species group. , 1998, Genetics.

[31]  Temple F. Smith,et al.  Comparison of the complete protein sets of worm and yeast: orthology and divergence. , 1998, Science.

[32]  K. Mathiopoulos,et al.  Cloning of inversion breakpoints in the Anopheles gambiae complex traces a transposable element at the inversion junction. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[33]  D. Petrov,et al.  High rate of DNA loss in the Drosophila melanogaster and Drosophila virilis species groups. , 1998, Molecular biology and evolution.

[34]  M. Fujioka,et al.  Analysis of an even-skipped rescue transgene reveals both composite and discrete neuronal and early blastoderm enhancers, and multi-stripe positioning by gap gene repressor gradients. , 1999, Development.

[35]  F J Ayala,et al.  Switch in codon bias and increased rates of amino acid substitution in the Drosophila saltans species group. , 1999, Genetics.

[36]  Burkhard Morgenstern,et al.  DIALIGN2: Improvement of the segment to segment approach to multiple sequence alignment , 1999, German Conference on Bioinformatics.

[37]  R George,et al.  An exploration of the sequence of a 2.9-Mb region of the genome of Drosophila melanogaster: the Adh region. , 1999, Genetics.

[38]  J. Wall,et al.  Unusual haplotype structure at the proximal breakpoint of In(2L)t in a natural population of Drosophila melanogaster. , 1999, Genetics.

[39]  J. Botas,et al.  Conservation of the expression and function of apterous orthologs in Drosophila and mammals. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[40]  V. Stanton,et al.  Screening Large‐Insert Libraries by Hybridization , 1999 .

[41]  M. Fujioka,et al.  The even-skipped locus is contained in a 16-kb chromatin domain. , 1999, Developmental biology.

[42]  Francisco J. Ayala,et al.  Fluctuating Mutation Bias and the Evolution of Base Composition in Drosophila , 2000, Journal of Molecular Evolution.

[43]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[44]  G M Rubin,et al.  A Drosophila complementary DNA resource. , 2000, Science.

[45]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[46]  Lior Pachter,et al.  VISTA : visualizing global DNA sequence alignments of arbitrary length , 2000, Bioinform..

[47]  H. Sasaki,et al.  Comparative genomic sequencing identifies novel tissue-specific enhancers and sequence elements for methylation-sensitive factors implicated in Igf2/H19 imprinting. , 2000, Genome research.

[48]  C. Lawrence,et al.  Human-mouse genome comparisons to locate regulatory sites , 2000, Nature Genetics.

[49]  D. Haussler,et al.  Genie--gene finding in Drosophila melanogaster. , 2000, Genome research.

[50]  R. Hardison Conserved noncoding sequences are reliable guides to regulatory elements. , 2000, Trends in genetics : TIG.

[51]  C. Blass,et al.  Characterization of the Hox gene cluster in the malaria vector mosquito, Anopheles gambiae , 2000, Evolution & development.

[52]  Z. Yang,et al.  Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. , 2000, Molecular biology and evolution.

[53]  T. Kaufman,et al.  Characterization of the Hox cluster from the mosquito Anopheles gambiae (Diptera: culicidae) , 2000, Evolution & development.

[54]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[55]  M. Cáceres,et al.  Molecular characterization of two natural hotspots in the Drosophila buzzatii genome induced by transposon insertions. , 2001, Genome research.

[56]  Ferran Casals,et al.  How malleable is the eukaryotic genome? Extreme rate of chromosomal rearrangement in the genus Drosophila. , 2001, Genome research.

[57]  Mikhail S. Gelfand,et al.  Gene recognition in eukaryotic DNA by comparison of genomic sequences , 2001, Bioinform..

[58]  Simon Cawley,et al.  Applications of generalized pair hidden Markov models to alignment and gene finding problems , 2001, J. Comput. Biol..

[59]  C. Aquadro,et al.  The evolutionary analysis of "orphans" from the Drosophila genome identifies rapidly diverging and incorrectly annotated genes. , 2001, Genetics.

[60]  A G Clark,et al.  The search for meaning in noncoding DNA. , 2001, Genome research.

[61]  M. Kreitman,et al.  Analysis of conserved noncoding DNA in Drosophila reveals similar constraints in intergenic and intronic sequences. , 2001, Genome research.

[62]  B. Haas,et al.  Full-length messenger RNA sequences greatly improve genome annotation , 2002, Genome Biology.

[63]  J. Botas,et al.  Direct regulation of the muscle-identity gene apterous by a Hox protein in the somatic mesoderm. , 2001, Development.

[64]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[65]  Gerald M Rubin,et al.  Heterochromatic sequences in a Drosophila whole-genome shotgun assembly , 2002, Genome Biology.

[66]  Wen-Hsiung Li,et al.  The K(A)/K(S) ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study. , 2002, Genome research.

[67]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[68]  P. Bork,et al.  Comparative genomic analysis in the region of a major Plasmodium-refractoriness locus of Anopheles gambiae , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[69]  S. Lewis,et al.  An integrated computational pipeline and database to support whole-genome sequence annotation , 2002, Genome Biology.

[70]  Jian Wang,et al.  The Genome Sequence of the Malaria Mosquito Anopheles gambiae , 2002, Science.

[71]  A. Ruíz,et al.  Chromosomal elements evolve at different rates in the Drosophila genome. , 2002, Genetics.

[72]  Susan J. Brown,et al.  Sequence of the Tribolium castaneum homeotic complex: the region corresponding to the Drosophila melanogaster antennapedia complex. , 2002, Genetics.

[73]  G. Rubin,et al.  Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[74]  Ziheng Yang,et al.  Phylogenetic Analysis by Maximum Likelihood (PAML) , 2002 .

[75]  E. Myers,et al.  Finishing a whole-genome shotgun: Release 3 of the Drosophila melanogaster euchromatic genome sequence , 2002, Genome Biology.

[76]  E. Birney,et al.  Apollo: a sequence annotation editor , 2002, Genome Biology.

[77]  K. H. Wolfe,et al.  Fourfold faster rate of genome rearrangement in nematodes than in Drosophila. , 2002, Genome research.

[78]  Peer Bork,et al.  Comparative Genome and Proteome Analysis of Anopheles gambiae and Drosophila melanogaster , 2002, Science.

[79]  C. Louis,et al.  A comparative genomic analysis of two distant diptera, the fruit fly, Drosophila melanogaster, and the malaria mosquito, Anopheles gambiae. , 2002, Genome research.

[80]  Alexey S Kondrashov,et al.  Analysis of similarity within 142 pairs of orthologous intergenic regions of Caenorhabditis elegans and Caenorhabditis briggsae. , 2002, Nucleic acids research.

[81]  M. Miles,et al.  An insect molecular clock dates the origin of the insects and accords with palaeontological and biogeographic landmarks. , 2002, Molecular biology and evolution.

[82]  G. Rubin,et al.  A Drosophila full-length cDNA resource , 2002, Genome Biology.

[83]  D. Hartl,et al.  Phylogeny and physiology of Drosophila opsins , 1994, Journal of Molecular Evolution.

[84]  T. W. Lyttle,et al.  The role of the transposable element hobo in the origin of endemic inversions in wild populations of Drosophila melanogaster , 2004, Genetica.

[85]  R. Frutos,et al.  Distribution of Drosophila melanogaster transposable element sequences in species of the obscura group , 1992, Chromosoma.

[86]  D. Petrov,et al.  A combined molecular and cytogenetic approach to genome evolution in Drosophila using large-fragment DNA cloning , 1993, Chromosoma.

[87]  M. Kreitman,et al.  The molecular clock revisited: the rate of synonymous vs. replacement change in Drosophila , 2004, Genetica.