Calculating bootstrap probabilities of phylogeny using multilocus sequence data.

Phylogeny estimation is extremely crucial in the study of molecular evolution. The increase in the amount of available genomic data facilitates phylogeny estimation from multilocus sequence data. Although maximum likelihood and Bayesian methods are available for phylogeny reconstruction using multilocus sequence data, these methods require heavy computation, and their application is limited to the analysis of a moderate number of genes and taxa. Distance matrix methods present suitable alternatives for analyzing huge amounts of sequence data. However, the manner in which distance methods can be applied to multilocus sequence data remains unknown. Here, we suggest new procedures to estimate molecular phylogeny using multilocus sequence data and evaluate its significance in the framework of the distance method. We found that concatenation of the multilocus sequence data may result in incorrect phylogeny estimation with an extremely high bootstrap probability (BP), which is due to incorrect estimation of the distances and intentional ignorance of the intergene variations. Therefore, we suggest that the distance matrices for multilocus sequence data be estimated separately and these matrices be subsequently combined to reconstruct phylogeny instead of phylogeny reconstruction using concatenated sequence data. To calculate the BPs of the reconstructed phylogeny, we suggest that 2-stage bootstrap procedures be adopted; in this, genes are resampled followed by resampling of the sequence columns within the resampled genes. By resampling the genes during calculation of BPs, intergene variations are properly considered. Via simulation studies and empirical data analysis, we demonstrate that our 2-stage bootstrap procedures are more suitable than the conventional bootstrap procedure that is adopted after sequence concatenation.

[1]  T. W. Anderson,et al.  An Introduction to Multivariate Statistical Analysis , 1959 .

[2]  T. W. Anderson An Introduction to Multivariate Statistical Analysis , 1959 .

[3]  H. Munro,et al.  Mammalian protein metabolism , 1964 .

[4]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[5]  S. Jeffery Evolution of Protein Molecules , 1979 .

[6]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[7]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[8]  N. Saito The neighbor-joining method : A new method for reconstructing phylogenetic trees , 1987 .

[9]  Susan R. Wilson,et al.  Two guidelines for bootstrap hypothesis testing , 1991 .

[10]  M. Nei,et al.  Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. , 1993, Molecular biology and evolution.

[11]  Z. Yang,et al.  Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. , 1993, Molecular biology and evolution.

[12]  B. Efron,et al.  Bootstrap confidence levels for phylogenetic trees. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Hidetoshi Shimodaira,et al.  Multiple Comparisons of Log-Likelihoods with Applications to Phylogenetic Inference , 1999, Molecular Biology and Evolution.

[14]  P. Waddell,et al.  Plastid Genome Phylogeny and a Model of Amino Acid Substitution for Proteins Encoded by Chloroplast DNA , 2000, Journal of Molecular Evolution.

[15]  M. Hasegawa,et al.  Interordinal relationships and timescale of eutherian evolution as inferred from mitochondrial genome data. , 2000, Gene.

[16]  M. Hasegawa,et al.  Phylogenetic position of turtles among amniotes: evidence from mitochondrial and nuclear genes. , 2000, Gene.

[17]  Z. Yang,et al.  Estimation of primate speciation dates using local molecular clocks. , 2000, Molecular biology and evolution.

[18]  Masatoshi Nei,et al.  Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[19]  M. Hasegawa,et al.  Mitochondrial phylogeny of hedgehogs and monophyly of Eulipotyphla. , 2003, Molecular phylogenetics and evolution.

[20]  John P. Huelsenbeck,et al.  MrBayes 3: Bayesian phylogenetic inference under mixed models , 2003, Bioinform..

[21]  Ziheng Yang Estimating the pattern of nucleotide substitution , 1994, Journal of Molecular Evolution.

[22]  Bryan Kolaczkowski,et al.  Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous , 2004, Nature.

[23]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[24]  Ziheng Yang,et al.  Evaluation of several methods for estimating phylogenetic trees when substitution rates differ over nucleotide sites , 1995, Journal of Molecular Evolution.

[25]  K. Holsinger,et al.  Polytomies and Bayesian phylogenetic inference. , 2005, Systematic biology.

[26]  Hirohisa Kishino,et al.  Incorporating gene-specific variation when inferring and evaluating optimal evolutionary tree topologies from multilocus sequence data. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[27]  H. Kishino,et al.  Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea , 1989, Journal of Molecular Evolution.

[28]  B. Rannala,et al.  Probability distribution of molecular evolutionary trees: A new method of phylogenetic inference , 1996, Journal of Molecular Evolution.

[29]  N. Rosenberg,et al.  Discordance of Species Trees with Their Most Likely Gene Trees , 2006, PLoS genetics.

[30]  Masami Hasegawa,et al.  Rooting the eutherian tree: the power and pitfalls of phylogenomics , 2007, Genome Biology.

[31]  B. Hallström,et al.  Phylogenomic data analyses provide evidence that Xenarthra and Afrotheria are sister groups. , 2007, Molecular biology and evolution.

[32]  J. Gatesy,et al.  The supermatrix approach to systematics. , 2007, Trends in ecology & evolution.

[33]  D. Pearl,et al.  High-resolution species trees without concatenation , 2007, Proceedings of the National Academy of Sciences.

[34]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[35]  A. Oskooi Molecular Evolution and Phylogenetics , 2008 .