A case for evolutionary genomics and the comprehensive examination of sequence biodiversity.

Comparative analysis is one of the most powerful methods available for understanding the diverse and complex systems found in biology, but it is often limited by a lack of comprehensive taxonomic sampling. Despite the recent development of powerful genome technologies capable of producing sequence data in large quantities (witness the recently completed first draft of the human genome), there has been relatively little change in how evolutionary studies are conducted. The application of genomic methods to evolutionary biology is a challenge, in part because gene segments from different organisms are manipulated separately, requiring individual purification, cloning, and sequencing. We suggest that a feasible approach to collecting genome-scale data sets for evolutionary biology (i.e., evolutionary genomics) may consist of combination of DNA samples prior to cloning and sequencing, followed by computational reconstruction of the original sequences. This approach will allow the full benefit of automated protocols developed by genome projects to be realized; taxon sampling levels can easily increase to thousands for targeted genomes and genomic regions. Sequence diversity at this level will dramatically improve the quality and accuracy of phylogenetic inference, as well as the accuracy and resolution of comparative evolutionary studies. In particular, it will be possible to make accurate estimates of normal evolution in the context of constant structural and functional constraints (i.e., site-specific substitution probabilities), along with accurate estimates of changes in evolutionary patterns, including pairwise coevolution between sites, adaptive bursts, and changes in selective constraints. These estimates can then be used to understand and predict the effects of protein structure and function on sequence evolution and to predict unknown details of protein structure, function, and functional divergence. In order to demonstrate the practicality of these ideas and the potential benefit for functional genomic analysis, we describe a pilot project we are conducting to simultaneously sequence large numbers of vertebrate mitochondrial genomes.

[1]  D D Pollock,et al.  Assessing an unknown evolutionary process: effect of increasing site-specific knowledge through taxon addition. , 2000, Molecular biology and evolution.

[2]  M. Stoneking,et al.  Questioning evidence for recombination in human mitochondrial DNA. , 2000, Science.

[3]  S. Easteal,et al.  How important is DNA replication for mutagenesis? , 2000, Molecular biology and evolution.

[4]  S. Vik,et al.  A model for the structure of subunit a of the Escherichia coli ATP synthase and its role in proton translocation. , 2000, Biochimica et biophysica acta.

[5]  C I Amos,et al.  DNA pooling in mutation detection with reference to sequence analysis. , 2000, American journal of human genetics.

[6]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[7]  D. Charlesworth,et al.  Low variability in a Y-linked plant gene and its implications for Y-chromosome evolution , 2000, Nature.

[8]  Frishman,et al.  Protein evolution and structural genomics , 2000, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[9]  Z. Yang,et al.  Relating physicochemical properties of amino acids to variable nucleotide substitution patterns among sites. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[10]  A. Dean,et al.  Enzyme evolution explained (sort of). , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[11]  Kara Dolinski,et al.  Integrating functional genomic information into the Saccharomyces Genome Database , 2000, Nucleic Acids Res..

[12]  P. Awadalla,et al.  Linkage disequilibrium and recombination in hominid mitochondrial DNA. , 1999, Science.

[13]  X. Gu,et al.  Statistical methods for testing functional divergence after gene duplication. , 1999, Molecular biology and evolution.

[14]  E V Koonin,et al.  A phylogenetic approach to target selection for structural genomics: solution structure of YciH. , 1999, Nucleic acids research.

[15]  T. Kocher,et al.  Mitogenomics: digging deeper with complete mitochondrial genomes. , 1999, Trends in ecology & evolution.

[16]  M. Miya,et al.  Organization of the Mitochondrial Genome of a Deep-Sea Fish, Gonostoma gracile (Teleostei: Stomiiformes): First Example of Transfer RNA Gene Rearrangements in Bony Fishes , 1999, Marine Biotechnology.

[17]  Merriweather Da,et al.  Mitochondrial recombination? (continued) , 1999 .

[18]  M. P. Cummings,et al.  Genes and other samples of DNA sequence data for phylogenetic inference. , 1999, The Biological bulletin.

[19]  G. Bernardi,et al.  Evolutionary Genomics of Vertebrates and Its Implications , 1999, Annals of the New York Academy of Sciences.

[20]  D. Swofford,et al.  Taxon sampling revisited , 1999, Nature.

[21]  W R Taylor,et al.  Coevolving protein residues: maximum likelihood identification and relationship to structure. , 1999, Journal of molecular biology.

[22]  M. Hasegawa,et al.  Interordinal relationships of birds and other reptiles based on whole mitochondrial genomes. , 1999, Systematic biology.

[23]  A. Janke,et al.  The mitochondrial DNA molecule of the aardvark, Orycteropus afer, and the position of the Tubulidentata in the eutherian tree , 1999, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[24]  M. Clark,et al.  Comparative genomics: the key to understanding the human genome project , 1999, BioEssays : news and reviews in molecular, cellular and developmental biology.

[25]  R A Goldstein,et al.  Using physical-chemistry-based substitution models in phylogenetic analyses of HIV-1 subtypes. , 1999, Molecular biology and evolution.

[26]  James E. Bray,et al.  The CATH Database provides insights into protein structure/function relationships , 1999, Nucleic Acids Res..

[27]  G. Christian Overton,et al.  Case-based reasoning driven gene annotation , 1998 .

[28]  Andrew Smith Genome sequence of the nematode C-elegans: A platform for investigating biology , 1998 .

[29]  S. Pääbo,et al.  Conflict Among Individual Mitochondrial Proteins in Resolving the Phylogeny of Eutherian Orders , 1998, Journal of Molecular Evolution.

[30]  D D Pollock,et al.  Increased accuracy in analytical molecular distance estimation. , 1998, Theoretical population biology.

[31]  T. A. Link,et al.  Complete structure of the 11-subunit bovine mitochondrial cytochrome bc1 complex. , 1998, Science.

[32]  J. Zhang,et al.  Correlation between the substitution rate and rate variation among sites in protein evolution. , 1998, Genetics.

[33]  R Staden,et al.  Automated detection of point mutations using fluorescent sequence trace subtraction. , 1998, Nucleic acids research.

[34]  M. Adams,et al.  Shotgun Sequencing of the Human Genome , 1998, Science.

[35]  K. Simonsen,et al.  Statistical tests of neutrality in the age of weak selection. , 1998, Trends in ecology & evolution.

[36]  David C. Jones,et al.  Assessing the impact of secondary structure and solvent accessibility on protein evolution. , 1998, Genetics.

[37]  A. Dean,et al.  The structural basis of molecular adaptation. , 1998, Molecular biology and evolution.

[38]  A. Graybeal,et al.  Is it better to add taxa or characters to a difficult phylogenetic problem? , 1998, Systematic biology.

[39]  Junhyong Kim,et al.  Large-scale phylogenies and measuring the performance of phylogenetic estimators. , 1998, Systematic biology.

[40]  D. Hillis,et al.  Taxonomic sampling, phylogenetic accuracy, and investigator bias. , 1998, Systematic biology.

[41]  P. Lewis,et al.  A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data. , 1998, Molecular biology and evolution.

[42]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[43]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[44]  J A Eisen,et al.  Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. , 1998, Genome research.

[45]  S. Yoshikawa,et al.  Crystal Structure of Bovine Heart Cytochrome c Oxidase at 2.8 Å Resolution , 1998, Journal of bioenergetics and biomembranes.

[46]  Simon Kasif,et al.  Computational methods in molecular biology , 1998 .

[47]  C Sander,et al.  Predicting protein structure using hidden Markov models , 1997, Proteins.

[48]  S. O’Brien,et al.  Comparative genomics: lessons from cats. , 1997, Trends in genetics : TIG.

[49]  J. Mitchell Guss,et al.  Crystal structure of the ϵ subunit of the proton-translocating ATP synthase from Escherichia coli , 1997 .

[50]  R A Goldstein,et al.  Predicting protein secondary structure with probabilistic schemata of evolutionarily derived information , 1997, Protein science : a publication of the Protein Society.

[51]  Jan Pieter Abrahams,et al.  The crystal structure of the nucleotide-free α3β3 subcomplex of F1-ATPase from the thermophilic Bacillus PS3 is a symmetric trimer , 1997 .

[52]  W. Taylor,et al.  Effectiveness of correlation analysis in identifying protein residues undergoing correlated evolution. , 1997, Protein engineering.

[53]  J. Weber,et al.  Human whole-genome shotgun sequencing. , 1997, Genome research.

[54]  Ú. Árnason,et al.  The complete mitochondrial DNA sequence of the white rhinoceros, Ceratotherium simum, and comparison with the mtDNA sequence of the Indian rhinoceros, Rhinoceros unicornis. , 1997, Molecular phylogenetics and evolution.

[55]  M. Clegg Plant Genetic Diversity and the Struggel to Measure Selection , 1997 .

[56]  P. Prodöhl,et al.  Development and application of long‐PCR for the assay of full‐length animal mitochondrial DNA , 1996, Molecular ecology.

[57]  W. Bruno Modeling residue usage in aligned protein sequences via maximum likelihood. , 1996, Molecular biology and evolution.

[58]  S. Yoshikawa,et al.  [Crystal structure of fully oxidized cytochrome c-oxidase from the bovine heart at 2.8 A resolution]. , 1996, Biokhimiia.

[59]  David C. Jones,et al.  Using evolutionary trees in protein secondary structure prediction and other comparative sequence analyses. , 1996, Journal of molecular biology.

[60]  D. Hillis Inferring complex phytogenies , 1996, Nature.

[61]  A. Meyer,et al.  Phylogenetic performance of mitochondrial protein-coding genes in resolving relationships among vertebrates. , 1996, Molecular biology and evolution.

[62]  K Takeyasu,et al.  Molecular imaging of Escherichia coli F0F1‐ATPase in reconstituted membranes using atomic force microscopy , 1996, FEBS letters.

[63]  R Staden,et al.  The staden sequence analysis package , 1996, Molecular biotechnology.

[64]  J. Craig Venter,et al.  A new strategy for genome sequencing , 1996, Nature.

[65]  T. Tomizaki,et al.  The Whole Structure of the 13-Subunit Oxidized Cytochrome c Oxidase at 2.8 Å , 1996, Science.

[66]  R A Goldstein,et al.  Predicting solvent accessibility: Higher accuracy using Bayesian statistics and optimized residue substitution classes , 1996, Proteins.

[67]  David C. Jones,et al.  Combining protein evolution and secondary structure. , 1996, Molecular biology and evolution.

[68]  J. V. López,et al.  Complete nucleotide sequences of the domestic cat (Felis catus) mitochondrial genome and a transposed mtDNA tandem repeat (Numt) in the nuclear genome. , 1996, Genomics.

[69]  G J Barton,et al.  Identification of functional residues and secondary structure from protein multiple sequence alignment. , 1996, Methods in enzymology.

[70]  Correlating structure-dependent mutation matrices with physical-chemical properties. , 1996, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[71]  B. Charlesworth,et al.  The pattern of neutral molecular variation under the background selection model. , 1995, Genetics.

[72]  M. Nei,et al.  A new method of inference of ancestral nucleotide and amino acid sequences. , 1995, Genetics.

[73]  G. Churchill,et al.  Properties of statistical tests of neutrality for DNA polymorphism data. , 1995, Genetics.

[74]  M. P. Cummings,et al.  Sampling properties of DNA sequence data in phylogenetic analysis. , 1995, Molecular biology and evolution.

[75]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[76]  R Staden,et al.  The application of numerical estimates of base calling accuracy to DNA sequencing projects. , 1995, Nucleic acids research.

[77]  Steven A. Benner,et al.  Reconstructing the evolutionary history of the artiodactyl ribonuclease superfamily , 1995, Nature.

[78]  N. Takahata,et al.  Recent African origin of modern humans revealed by complete sequences of hominoid mitochondrial DNAs. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[79]  Owen White,et al.  TIGR Assembler: A New Tool for Assembling Large Shotgun Sequencing Projects , 1995 .

[80]  J. Bonfield,et al.  A new DNA sequence assembly program. , 1995, Nucleic acids research.

[81]  Ú. Árnason,et al.  The complete mitochondrial DNA sequence of the horse, Equus caballus: extensive heteroplasmy of the control region. , 1994, Gene.

[82]  Jan Pieter Abrahams,et al.  Structure at 2.8 Â resolution of F1-ATPase from bovine heart mitochondria , 1994, Nature.

[83]  A. Meyer,et al.  Shortcomings of the cytochrome b gene as a molecular marker. , 1994, Trends in ecology & evolution.

[84]  Russell Higuchi,et al.  Effective amplification of long targets from cloned inserts and human genomic DNA. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[85]  B. Charlesworth The effect of background selection against deleterious mutations on weakly selected, linked variants. , 1994, Genetical research.

[86]  F. Tajima,et al.  Simple methods for testing the molecular evolutionary clock hypothesis. , 1993, Genetics.

[87]  B. Charlesworth,et al.  The effect of deleterious mutations on neutral molecular variation. , 1993, Genetics.

[88]  P. C. Huang,et al.  The complete nucleotide sequence of the Crossostoma lacustre mitochondrial genome: conservation and variations among vertebrates. , 1992, Nucleic acids research.

[89]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[90]  C. Aquadro,et al.  Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster , 1992, Nature.

[91]  Brian W. Matthews,et al.  Ancestral lysozymes reconstructed, neutrality tested, and thermostability linked to hydrocarbon packing , 1990, Nature.

[92]  P. Desjardins,et al.  Sequence and gene organization of the chicken mitochondrial genome. A novel gene order in higher vertebrates. , 1990, Journal of molecular biology.

[93]  Douglas C. Wallace,et al.  Report of the committee on human mitochondrial DNA. , 1990, Cytogenetics and cell genetics.

[94]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[95]  G. Zurawski Evolution of Higher-Plant Chloroplast DNA-Encoded Genes: Implications for Structure-Function and Phylogenetic Studies , 1987 .

[96]  A M Lesk,et al.  The evolution of protein structures. , 1987, Cold Spring Harbor symposia on quantitative biology.

[97]  B. Roe,et al.  The complete nucleotide sequence of the Xenopus laevis mitochondrial genome. , 1985, The Journal of biological chemistry.

[98]  F. Sanger,et al.  Complete sequence of bovine mitochondrial DNA. Conserved features of the mammalian mitochondrial genome. , 1982, Journal of molecular biology.

[99]  P. Slonimski,et al.  Preface/Front Matter , 1982 .

[100]  F. Sanger,et al.  Comparison of the Human and Bovine Mitochondrial Genomes , 1982 .

[101]  D. A. Clayton,et al.  Sequence and gene organization of mouse mitochondrial DNA , 1981, Cell.

[102]  F. Sanger,et al.  Sequence and organization of the human mitochondrial genome , 1981, Nature.

[103]  A. Lesk,et al.  How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. , 1980, Journal of molecular biology.