The global landscape of sequence diversity

BackgroundSystematic comparisons between genomic sequence datasets have revealed a wide spectrum of sequence specificity from sequences that are highly conserved to those that are specific to individual species. Due to the limited number of fully sequenced eukaryotic genomes, analyses of this spectrum have largely focused on prokaryotes. Combining existing genomic datasets with the partial genomes of 193 eukaryotes derived from collections of expressed sequence tags, we performed a quantitative analysis of the sequence specificity spectrum to provide a global view of the origins and extent of sequence diversity across the three domains of life.ResultsComparisons with prokaryotic datasets reveal a greater genetic diversity within eukaryotes that may be related to differences in modes of genetic inheritance. Mapping this diversity within a phylogenetic framework revealed that the majority of sequences are either highly conserved or specific to the species or taxon from which they derive. Between these two extremes, several evolutionary landmarks consisting of large numbers of sequences conserved within specific taxonomic groups were identified. For example, 8% of sequences derived from metazoan species are specific and conserved within the metazoan lineage. Many of these sequences likely mediate metazoan specific functions, such as cell-cell communication and differentiation.ConclusionThrough the use of partial genome datasets, this study provides a unique perspective of sequence conservation across the three domains of life. The provision of taxon restricted sequences should prove valuable for future computational and biochemical analyses aimed at understanding evolutionary and functional relationships.

[1]  T. Gojobori,et al.  Origin and evolutionary process of the CNS elucidated by comparative genomics analysis of planarian ESTs , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Anton J. Enright,et al.  Myriads of protein families, and still counting , 2003, Genome Biology.

[3]  S Blair Hedges,et al.  The colonization of land by animals: molecular phylogeny and divergence times among arthropods. , 2004, BMC biology.

[4]  J. Andersson,et al.  Lateral gene transfer in eukaryotes , 2005, Cellular and Molecular Life Sciences CMLS.

[5]  T. Allers,et al.  Archaeal genetics — the third way , 2005, Nature Reviews Genetics.

[6]  J. Parkinson,et al.  Expressed sequence tag survey of gene expression in the scab mite Psoroptes ovis – allergens, proteases and free-radical scavengers , 2003, Parasitology.

[7]  Robert L. Charlebois,et al.  Weighted Genome Trees: Refinements and Applications , 2005, Journal of bacteriology.

[8]  M Gerstein,et al.  Protein evolution. How far can sequences diverge? , 1997, Nature.

[9]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology , 2003, Nucleic Acids Res..

[10]  James H. Brown,et al.  The rate of DNA evolution: effects of body size and temperature on the molecular clock. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[11]  B. Swalla,et al.  Evolution of the chordate body plan: new insights from phylogenetic analyses of deuterostome phyla. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[12]  W. Doolittle,et al.  Phylogenetic analyses of cyanobacterial genomes: quantification of horizontal gene transfer events. , 2006, Genome research.

[13]  Anton J. Enright,et al.  Protein families and TRIBES in genome sequence space. , 2003, Nucleic acids research.

[14]  H. Ochman,et al.  Bacterial genomes as new gene homes: the genealogy of ORFans in E. coli. , 2004, Genome research.

[15]  David S. Eisenberg,et al.  Finding families for genomic ORFans , 1999, Bioinform..

[16]  Neil Hall,et al.  A transcriptomic analysis of the phylum Nematoda , 2004, Nature Genetics.

[17]  A. Halpern,et al.  The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific , 2007, PLoS biology.

[18]  R. Raff,et al.  Evidence for a clade of nematodes, arthropods and other moulting animals , 1997, Nature.

[19]  Anton J. Enright,et al.  COmplete GENome Tracking (COGENT): A Flexible Data Environment for Computational Genomics , 2003, Bioinform..

[20]  Sudhir Kumar,et al.  Genomic clocks and evolutionary timescales. , 2003, Trends in genetics : TIG.

[21]  C. Chothia,et al.  Evolution of the Protein Repertoire , 2003, Science.

[22]  Mark Gerstein,et al.  How far can sequences diverge? , 1997, Nature.

[23]  Nikos Kyrpides,et al.  Genomes OnLine Database (GOLD): a monitor of genome projects world-wide , 2001, Nucleic Acids Res..

[24]  Mark L. Blaxter,et al.  A molecular evolutionary framework for the phylum Nematoda , 1998, Nature.

[25]  Francesco Gasparoni,et al.  Stratified prokaryote network in the oxic–anoxic transition of a deep-sea halocline , 2006, Nature.

[26]  H. Ochman,et al.  Lateral gene transfer and the nature of bacterial innovation , 2000, Nature.

[27]  H. Tettelin,et al.  The microbial pan-genome. , 2005, Current opinion in genetics & development.

[28]  C. Woese The universal ancestor. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[29]  M. Blaxter,et al.  Evolutionary biology: Animal roots and shoots , 2005, Nature.

[30]  H. Philippe,et al.  Multigene analyses of bilaterian animals corroborate the monophyly of Ecdysozoa, Lophotrochozoa, and Protostomia. , 2005, Molecular biology and evolution.

[31]  E. Koonin,et al.  Coelomata and not Ecdysozoa: evidence from genome-wide phylogenetic analysis. , 2003, Genome research.

[32]  O. White,et al.  Environmental Genome Shotgun Sequencing of the Sargasso Sea , 2004, Science.

[33]  A. Moya,et al.  The evolutionary origin of Xanthomonadales genomes and the nature of the horizontal gene transfer process. , 2006, Molecular biology and evolution.

[34]  Mark Blaxter,et al.  Genome sequencing: time to widen our horizons , 2002 .

[35]  D. Fischer,et al.  Analysis of singleton ORFans in fully sequenced microbial genomes , 2003, Proteins.

[36]  Kenji Matsuura,et al.  Reconstructing the early evolution of Fungi using a six-gene phylogeny , 2006, Nature.

[37]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Jaideep P. Sundaram,et al.  Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[39]  John Parkinson,et al.  PartiGeneDB—collating partial genomes , 2004, Nucleic Acids Res..

[40]  Peer Bork,et al.  Consistency of genome‐based methods in measuring Metazoan evolution , 2005, FEBS letters.

[41]  Frances H Arnold,et al.  Fancy footwork in the sequence space shuffle , 2006, Nature Biotechnology.

[42]  Eugene V. Koonin,et al.  Comparative genomics, minimal gene-sets and the last universal common ancestor , 2003, Nature Reviews Microbiology.

[43]  Mark A McPeek,et al.  Estimating metazoan divergence times with a molecular clock. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[44]  F. Delsuc,et al.  The timing of eukaryotic evolution: does a relaxed molecular clock reconcile proteins and fossils? , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[45]  Vincent Daubin,et al.  Examining bacterial species under the specter of gene transfer and exchange , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[46]  G. B. Golding,et al.  The role of laterally transferred genes in adaptive evolution , 2007, BMC Evolutionary Biology.

[47]  Alfried P Vogler,et al.  Dense taxonomic EST sampling and its applications for molecular systematics of the Coleoptera (beetles). , 2006, Molecular biology and evolution.

[48]  N. Moran,et al.  Genomic changes following host restriction in bacteria. , 2004, Current opinion in genetics & development.

[49]  Samuel Karlin,et al.  Protein length in eukaryotic and prokaryotic proteomes , 2005, Nucleic acids research.

[50]  S Blair Hedges,et al.  BMC Evolutionary Biology BioMed Central , 2003 .

[51]  Kimberly Van Auken,et al.  WormBase: a comprehensive data resource for Caenorhabditis biology and genomics , 2004, Nucleic Acids Res..

[52]  T. Richmond,et al.  Crystal structure of the nucleosome core particle at 2.8 Å resolution , 1997, Nature.

[53]  A. Simpson,et al.  The real ‘kingdoms’ of eukaryotes , 2004, Current Biology.

[54]  J. Shultz,et al.  Pancrustacean phylogeny: hexapods are terrestrial crustaceans and maxillopods are not monophyletic , 2005, Proceedings of the Royal Society B: Biological Sciences.

[55]  L. Poladian,et al.  Is the "Big Bang" in Animal Evolution Real? , 2005, Science.

[56]  Benjamin J. Raphael,et al.  The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families , 2007, PLoS biology.

[57]  Jianzhi Zhang,et al.  Rapid Subfunctionalization Accompanied by Prolonged and Substantial Neofunctionalization in Duplicate Gene Evolution , 2005, Genetics.

[58]  Andrew P. Martin,et al.  Body size, metabolic rate, generation time, and the molecular clock. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[59]  Yan Boucher,et al.  Phylogenetic reconstruction and lateral gene transfer. , 2004, Trends in microbiology.

[60]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[61]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[62]  Mark Blaxter Opinion piece. Genome sequencing: time to widen our horizons. , 2002, Briefings in functional genomics & proteomics.

[63]  M. Pop,et al.  Metagenomic Analysis of the Human Distal Gut Microbiome , 2006, Science.

[64]  Robert L Charlebois,et al.  Chlamydia: 780.57 (sd = 1.81), range 778–784, n =7 Cyanobacteria: 820.50 (sd = 23.53), range 776–844, n =8 , 2022 .

[65]  N. Pace,et al.  The genetic core of the universal ancestor. , 2003, Genome research.

[66]  C. Laird,et al.  Rate of Fixation of Nucleotide Substitutions in Evolution , 1969, Nature.

[67]  Mark L. Blaxter,et al.  PartiGene-constructing partial genomes , 2004, Bioinform..

[68]  Jack Sullivan,et al.  Evaluating hypotheses of deuterostome phylogeny and chordate evolution with new LSU and SSU ribosomal DNA data. , 2002, Molecular biology and evolution.

[69]  C. Fraser,et al.  Recombination and the Nature of Bacterial Speciation , 2007, Science.

[70]  L. Chao,et al.  THE MOLECULAR CLOCK AND THE RELATIONSHIP BETWEEN POPULATION SIZE AND GENERATION TIME , 1993, Evolution; international journal of organic evolution.