A practical approach to phylogenomics: the phylogeny of ray-finned fish (Actinopterygii) as a case study

BackgroundMolecular systematics occupies one of the central stages in biology in the genomic era, ushered in by unprecedented progress in DNA technology. The inference of organismal phylogeny is now based on many independent genetic loci, a widely accepted approach to assemble the tree of life. Surprisingly, this approach is hindered by lack of appropriate nuclear gene markers for many taxonomic groups especially at high taxonomic level, partially due to the lack of tools for efficiently developing new phylogenetic makers. We report here a genome-comparison strategy to identifying nuclear gene markers for phylogenetic inference and apply it to the ray-finned fishes – the largest vertebrate clade in need of phylogenetic resolution.ResultsA total of 154 candidate molecular markers – relatively well conserved, putatively single-copy gene fragments with long, uninterrupted exons – were obtained by comparing whole genome sequences of two model organisms, Danio rerio and Takifugu rubripes. Experimental tests of 15 of these (randomly picked) markers on 36 taxa (representing two-thirds of the ray-finned fish orders) demonstrate the feasibility of amplifying by PCR and directly sequencing most of these candidates from whole genomic DNA in a vast diversity of fish species. Preliminary phylogenetic analyses of sequence data obtained for 14 taxa and 10 markers (total of 7,872 bp for each species) are encouraging, suggesting that the markers obtained will make significant contributions to future fish phylogenetic studies.ConclusionWe present a practical approach that systematically compares whole genome sequences to identify single-copy nuclear gene markers for inferring phylogeny. Our method is an improvement over traditional approaches (e.g., manually picking genes for testing) because it uses genomic information and automates the process to identify large numbers of candidate makers. This approach is shown here to be successful for fishes, but also could be applied to other groups of organisms for which two or more complete genome sequences exist, which has important implications for assembling the tree of life.

[1]  Christian E. V. Storm,et al.  Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. , 2001, Journal of molecular biology.

[2]  J. G. Burleigh,et al.  Prospects for Building the Tree of Life from Large Sequence Databases , 2004, Science.

[3]  G. Arratia Phylogenetic relationships of Teleostei. Past and present , 2000 .

[4]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[5]  Peter G. Foster,et al.  Compositional Bias May Affect Both DNA-Based and Protein-Based Phylogenetic Reconstructions , 1999, Journal of Molecular Evolution.

[6]  G. Barrowclough,et al.  Basal divergences in birds and the phylogenetic utility of the nuclear RAG-1 gene. , 1999, Molecular phylogenetics and evolution.

[7]  M. Nei,et al.  Relationships between gene trees and species trees. , 1988, Molecular biology and evolution.

[8]  M. A. Steel,et al.  Confidence in evolutionary trees from biological sequence data , 1993, Nature.

[9]  J. J. Day,et al.  Fishes of the World, 4th Edition , 2006 .

[10]  T. Castoe,et al.  Data partitions and complex models in Bayesian analysis: the phylogeny of Gymnophthalmid lizards. , 2004, Systematic biology.

[11]  A. Meyer,et al.  Are all fishes ancient polyploids? , 2004, Journal of Structural and Functional Genomics.

[12]  J. Thompson,et al.  The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. , 1997, Nucleic acids research.

[13]  C. Woese,et al.  The Deinococcus-Thermus phylum and the effect of rRNA composition on phylogenetic tree construction. , 1989, Systematic and applied microbiology.

[14]  C. Fraser,et al.  Phylogenomics: Intersection of Evolution and Genomics , 2003, Science.

[15]  F. Delsuc,et al.  Phylogenomics and the reconstruction of the tree of life , 2005, Nature Reviews Genetics.

[16]  J. C. Regier,et al.  Nuclear gene sequences for higher level phylogenetic analysis: 14 promising candidates , 1992 .

[17]  James Lyons-Weiler,et al.  Relative apparent synapomorphy analysis (RASA). I: The statistical measurement of phylogenetic signal. , 1996, Molecular biology and evolution.

[18]  W. Fitch Distinguishing homologous from analogous proteins. , 1970, Systematic zoology.

[19]  S Rozen,et al.  Primer3 on the WWW for general users and for biologist programmers. , 2000, Methods in molecular biology.

[20]  G. Naylor,et al.  Choosing the best genes for the job: the case for stationary genes in genome-scale phylogenetics. , 2005, Systematic biology.

[21]  S. Donnellan,et al.  C-mos, a nuclear marker useful for squamate phylogenetic analysis. , 1998, Molecular phylogenetics and evolution.

[22]  Terry Gaasterland,et al.  The analysis of 100 genes supports the grouping of three highly divergent amoebae: Dictyostelium, Entamoeba, and Mastigamoeba , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Charles E. Chapple,et al.  Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype , 2004, Nature.

[24]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[25]  Roderic D. M. Page,et al.  Vertebrate Phylogenomics: Reconciled Trees and Gene Duplications , 2001, Pacific Symposium on Biocomputing.

[26]  Axel Meyer,et al.  Novel evolutionary relationship among four fish model systems. , 2004, Trends in genetics : TIG.

[27]  John P. Huelsenbeck,et al.  MrBayes 3: Bayesian phylogenetic inference under mixed models , 2003, Bioinform..

[28]  Dirk Steinke,et al.  Novel Relationships Among Ten Fish Model Species Revealed Based on a Phylogenomic Analysis Using ESTs , 2006, Journal of Molecular Evolution.

[29]  A. Meyer,et al.  Genome duplication, a trait shared by 22000 species of ray-finned fish. , 2003, Genome research.

[30]  M. P. Cummings,et al.  PAUP* Phylogenetic analysis using parsimony (*and other methods) Version 4 , 2000 .

[31]  K. Strimmer,et al.  TREEFINDER: a powerful graphical analysis environment for molecular phylogenetics , 2004, BMC Evolutionary Biology.

[32]  David Posada,et al.  ProtTest: selection of best-fit models of protein evolution , 2005, Bioinform..

[33]  B. Collette,et al.  Phylogenetic Relationships of New World Needlefishes (Teleostei: Belonidae) and the Biogeography of Transitions between Marine and Freshwater Habitats , 2001, Copeia.

[34]  E. Zimmer,et al.  Generating single-copy nuclear gene data for a recent adaptive radiation. , 2006, Molecular phylogenetics and evolution.

[35]  Y L Wang,et al.  Zebrafish hox clusters and vertebrate genome evolution. , 1998, Science.

[36]  Frédéric Delsuc,et al.  Heterotachy and long-branch attraction in phylogenetics , 2005, BMC Evolutionary Biology.

[37]  M. Stiassny,et al.  Interrelationships of fishes , 1997 .

[38]  J. S. Nelson,et al.  Fishes of the World, 3rd Edition , 1994 .

[39]  A. Chicaro,et al.  Animal Evolution and the Molecular Signature of Radiations Compressed in Time , 2005 .

[40]  John H Postlethwait,et al.  The zebrafish gene map defines ancestral vertebrate chromosomes. , 2005, Genome research.

[41]  D. Penny,et al.  Genome-scale phylogeny and the detection of systematic biases. , 2004, Molecular biology and evolution.

[42]  A. Meyer,et al.  From 2R to 3R: evidence for a fish-specific genome duplication (FSGD). , 2005, BioEssays : news and reviews in molecular, cellular and developmental biology.

[43]  J. Klein,et al.  Molecular phylogeny of early vertebrates: monophyly of the agnathans as revealed by sequences of 35 genes. , 2003, Molecular biology and evolution.

[44]  M. Miya,et al.  The phylogenetic position of toadfishes (order Batrachoidiformes) in the higher ray-finned fish as inferred from partitioned Bayesian analysis of 102 whole mitochondrial genome sequences , 2005 .

[45]  Derrick J. Zwickl,et al.  Is sparse taxon sampling a problem for phylogenetic inference? , 2003, Systematic biology.

[46]  H. Philippe,et al.  Heterotachy, an important process of protein evolution. , 2002, Molecular biology and evolution.

[47]  Joel Cracraft,et al.  Assembling the tree of life , 2004 .

[48]  A. Schmitz,et al.  Partitioned Bayesian analyses, partition choice, and the phylogenetic relationships of scincid lizards. , 2005, Systematic biology.

[49]  F. Galibert,et al.  Gorilla and orangutan c-myc nucleotide sequences: Inference on hominoid phylogeny , 1995, Journal of Molecular Evolution.

[50]  D. Penny,et al.  The root of the mammalian tree inferred from whole mitochondrial genomes. , 2003, Molecular phylogenetics and evolution.

[51]  J. S. Nelson,et al.  Fishes of the world. , 1978 .

[52]  Hidetoshi Shimodaira,et al.  Multiple Comparisons of Log-Likelihoods with Applications to Phylogenetic Inference , 1999, Molecular Biology and Evolution.

[53]  P. Holland,et al.  Phylogenomics of eukaryotes: impact of missing data on large alignments. , 2004, Molecular biology and evolution.

[54]  Pamela S Soltis,et al.  Genome-scale data, angiosperm relationships, and "ending incongruence": a cautionary tale in phylogenetics. , 2004, Trends in plant science.

[55]  J. Wendel,et al.  L. A. S. JOHNSON REVIEW No. 2 Use of nuclear genes for phylogeny reconstruction in plants , 2004 .

[56]  D. Swofford PAUP*: Phylogenetic analysis using parsimony (*and other methods), Version 4.0b10 , 2002 .

[57]  M. Suyama,et al.  Complex genomic rearrangements lead to novel primate gene function. , 2005, Genome research.

[58]  J. Inoue,et al.  Major patterns of higher teleostean phylogenies: a new perspective based on 100 complete mitochondrial DNA sequences. , 2003, Molecular phylogenetics and evolution.

[59]  S. O’Brien,et al.  Molecular phylogenetics and the origins of placental mammals , 2001, Nature.