Inferring species phylogenies from multiple genes: concatenated sequence tree versus consensus gene tree.

Phylogenetic trees from multiple genes can be obtained in two fundamentally different ways. In one, gene sequences are concatenated into a super-gene alignment, which is then analyzed to generate the species tree. In the other, phylogenies are inferred separately from each gene, and a consensus of these gene phylogenies is used to represent the species tree. Here, we have compared these two approaches by means of computer simulation, using 448 parameter sets, including evolutionary rate, sequence length, base composition, and transition/transversion rate bias. In these simulations, we emphasized a worst-case scenario analysis in which 100 replicate datasets for each evolutionary parameter set (gene) were generated, and the replicate dataset that produced a tree topology showing the largest number of phylogenetic errors was selected to represent that parameter set. Both randomly selected and worst-case replicates were utilized to compare the consensus and concatenation approaches primarily using the neighbor-joining (NJ) method. We find that the concatenation approach yields more accurate trees, even when the sequences concatenated have evolved with very different substitution patterns and no attempts are made to accommodate these differences while inferring phylogenies. These results appear to hold true for parsimony and likelihood methods as well. The concatenation approach shows >95% accuracy with only 10 genes. However, this gain in accuracy is sometimes accompanied by reinforcement of certain systematic biases, resulting in spuriously high bootstrap support for incorrect partitions, whether we employ site, gene, or a combined bootstrap resampling approach. Therefore, it will be prudent to report the number of individual genes supporting an inferred clade in the concatenated sequence tree, in addition to the bootstrap support.

[1]  Allen G. Rodrigo,et al.  A randomisation test of the null hypothesis that two cladograms are sample estimates of a parametric phylogenetic tree , 1993 .

[2]  M. Gouy,et al.  HOVERGEN: a database of homologous vertebrate genes. , 1994, Nucleic acids research.

[3]  Sudhir Kumar,et al.  Heterogeneity of nucleotide frequencies among evolutionary lineages and phylogenetic inference. , 2003, Molecular biology and evolution.

[4]  S. O’Brien,et al.  Molecular dating and biogeography of the early placental mammal radiation. , 2001, The Journal of heredity.

[5]  J. Doyle,et al.  Gene Trees and Species Trees: Molecular Systematics as One-Character Taxonomy , 1992 .

[6]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[7]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[8]  Ziheng Yang,et al.  Maximum-likelihood models for combined analyses of multiple sequence data , 1996, Journal of Molecular Evolution.

[9]  D. Swofford PAUP*: Phylogenetic analysis using parsimony (*and other methods), Version 4.0b10 , 2002 .

[10]  S. O’Brien,et al.  Molecular phylogenetics and the origins of placental mammals , 2001, Nature.

[11]  D. Hillis,et al.  Molecular Versus Morphological Approaches to Systematics , 1987 .

[12]  S. Jeffery Evolution of Protein Molecules , 1979 .

[13]  Junhyong Kim,et al.  Separate Versus Combined Analysis of Phylogenetic Evidence , 1995 .

[14]  R. A. Van Den Bussche,et al.  MOLECULAR PHYLOGENETICS AND TAXONOMIC REVIEW OF NOCTILIONOID AND VESPERTILIONOID BATS (CHIROPTERA: YANGOCHIROPTERA) , 2003 .

[15]  M. Nei,et al.  Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. , 1993, Molecular biology and evolution.

[16]  Richard G. Olmstead,et al.  Combining Data in Phylogenetic Systematics: An Empirical Approach Using Three Molecular Data Sets in the Solanaceae , 1994 .

[17]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[18]  Sudhir Kumar,et al.  Efficiency of the Neighbor-Joining Method in Reconstructing Deep and Shallow Evolutionary Relationships in Large Phylogenies , 2000, Journal of Molecular Evolution.

[19]  M. Nei,et al.  Prospects for inferring very large phylogenies by using the neighbor-joining method. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[20]  M. Allard,et al.  Sources of incongruence among mammalian mitochondrial sequences: COII, COIII, and ND6 genes are main contributors. , 2001, Molecular phylogenetics and evolution.

[21]  M. Rosenberg,et al.  Traditional phylogenetic reconstruction methods reconstruct shallow and deep evolutionary relationships equally well. , 2001, Molecular biology and evolution.

[22]  J. Bull,et al.  Is character weighting a Panacea for the problem of data heterogeneity in phylogenetic analysis , 1994 .

[23]  J. Bull,et al.  Partitioning and combining data in phylogenetic analysis , 1993 .

[24]  M. Suchard,et al.  Hierarchical phylogenetic models for analyzing multipartite sequence data. , 2003, Systematic biology.

[25]  S. Baldauf A Search for the Origins of Animals and Fungi: Comparing and Combining Molecular Data , 1999, The American Naturalist.

[26]  D. Penny,et al.  Genome-scale phylogeny and the detection of systematic biases. , 2004, Molecular biology and evolution.

[27]  Frédéric Delsuc,et al.  Molecular systematics of armadillos (Xenarthra, Dasypodidae): contribution of maximum likelihood and Bayesian analyses of mitochondrial and nuclear genes. , 2003, Molecular phylogenetics and evolution.

[28]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[29]  S. O’Brien,et al.  Nuclear gene sequences confirm an ancient link between New Zealand's short-tailed bat and South American noctilionoid bats. , 2003, Molecular Phylogenetics and Evolution.

[30]  E. Koonin,et al.  Coelomata and not Ecdysozoa: evidence from genome-wide phylogenetic analysis. , 2003, Genome research.

[31]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[32]  A. Queiroz For Consensus (Sometimes) , 1993 .

[33]  Michael M. Miyamoto,et al.  TESTING SPECIES PHYLOGENIES AND PHYLOGENETIC METHODS WITH CONGRUENCE , 1995 .

[34]  D. Penny,et al.  The Use of Tree Comparison Metrics , 1985 .

[35]  Klaus-Peter Koepfli,et al.  A new phylogenetic marker, apolipoprotein B, provides compelling evidence for eutherian relationships. , 2003, Molecular phylogenetics and evolution.

[36]  M. Donoghue,et al.  Integration of morphological and ribosomal RNA data on the origin of angiosperms , 1994 .

[37]  S Blair Hedges,et al.  BMC Evolutionary Biology BioMed Central , 2003 .

[38]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[39]  G. Glazko,et al.  Estimation of divergence times from multiprotein sequences for a few mammalian species and several distantly related organisms , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[40]  J. William,et al.  Combining data in phylogenetic analysis. , 1996, Trends in ecology & evolution.