Recent trends in molecular phylogenetic analysis: where to next?

The acquisition of large multilocus sequence data is providing researchers with an unprecedented amount of information to resolve difficult phylogenetic problems. With these large quantities of data comes the increasing challenge regarding the best methods of analysis. We review the current trends in molecular phylogenetic analysis, focusing specifically on the topics of multiple sequence alignment and methods of tree reconstruction. We suggest that traditional methods are inadequate for these highly heterogeneous data sets and that researchers employ newer more sophisticated search algorithms in their analyses. If we are to best extract the information present in these data sets, a sound understanding of basic phylogenetic principles combined with modern methodological techniques are necessary.

[1]  Tandy J. Warnow,et al.  Barking Up The Wrong Treelength: The Impact of Gap Penalty on Alignment and Tree Accuracy , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Derrick J. Zwickl Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion , 2006 .

[3]  Mark P. Simmons,et al.  Incorporation, relative homoplasy, and effect of gap characters in sequence-based phylogenetic analyses. , 2001, Systematic biology.

[4]  Simon Easteal,et al.  Mind the gaps: evidence of bias in estimates of multiple sequence alignments. , 2007, Molecular biology and evolution.

[5]  C. Bult,et al.  TESTING SIGNIFICANCE OF INCONGRUENCE , 1994 .

[6]  D. Pearl,et al.  Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. , 2007, Systematic biology.

[7]  R. Meier,et al.  Software Review , 2005 .

[8]  J. Felsenstein,et al.  An evolutionary model for maximum likelihood alignment of DNA sequences , 1991, Journal of Molecular Evolution.

[9]  Gerard Talavera,et al.  Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. , 2007, Systematic biology.

[10]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[11]  L. Olson,et al.  Networks, trees, and treeshrews: assessing support and identifying conflict with multiple loci and a problematic root. , 2009, Systematic biology.

[12]  G. Giribet,et al.  TNT: Tree Analysis Using New Technology , 2005 .

[13]  L. Stein,et al.  Species trees from highly incongruent gene trees in rice. , 2009, Systematic biology.

[14]  D. Higgins,et al.  Multiple sequence alignments. , 2005, Current opinion in structural biology.

[15]  W. Fitch,et al.  An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution , 1970, Biochemical Genetics.

[16]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[17]  A. Kluge A Concern for Evidence and a Phylogenetic Hypothesis of Relationships among Epicrates (Boidae, Serpentes) , 1989 .

[18]  P. Lewis A likelihood approach to estimating phylogeny from discrete morphological character data. , 2001, Systematic biology.

[19]  A. Drummond,et al.  Bayesian Inference of Species Trees from Multilocus Data , 2009, Molecular biology and evolution.

[20]  M. Pagel,et al.  Modelling heterotachy in phylogenetic inference by reversible-jump Markov chain Monte Carlo , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[21]  N. Rosenberg,et al.  Discordance of Species Trees with Their Most Likely Gene Trees , 2006, PLoS genetics.

[22]  Elisabeth R. M. Tillier,et al.  The accuracy of several multiple sequence alignment programs for proteins , 2006, BMC Bioinformatics.

[23]  I. Holmes,et al.  A "Long Indel" model for evolutionary sequence alignment. , 2003, Molecular biology and evolution.

[24]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[25]  K. Nixon,et al.  The Parsimony Ratchet, a New Method for Rapid Parsimony Analysis , 1999, Cladistics : the international journal of the Willi Hennig Society.

[26]  Loren H. Rieseberg,et al.  Hybridization, introgression, and linkage evolution , 2004, Plant Molecular Biology.

[27]  Chuong B. Do,et al.  Access the most recent version at doi: 10.1101/gr.926603 References , 2003 .

[28]  W. Wheeler,et al.  The position of arthropods in the animal kingdom: Ecdysozoa, islands, trees, and the "Parsimony ratchet". , 1999, Molecular phylogenetics and evolution.

[29]  Olivier Gascuel,et al.  Mathematics of Evolution and Phylogeny , 2005 .

[30]  Makoto Kato,et al.  Evolution and phylogenetic utility of alignment gaps within intron sequences of three nuclear genes in bumble bees (Bombus). , 2003, Molecular biology and evolution.

[31]  J. Schulte,et al.  Phylogenetic relationships among iguanian lizards using alternative partitioning methods and TSHZ1: a new phylogenetic marker for reptiles. , 2009, Molecular phylogenetics and evolution.

[32]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[33]  A. Schmitz,et al.  Partitioned Bayesian analyses, partition choice, and the phylogenetic relationships of scincid lizards. , 2005, Systematic biology.

[34]  Bryan C Carstens,et al.  Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting: an example from Melanoplus grasshoppers. , 2007, Systematic biology.

[35]  Fu-Hsiung Chang,et al.  Surfection: a new platform for transfected cell arrays. , 2004, Nucleic acids research.

[36]  J. Felsenstein,et al.  Inching toward reality: An improved likelihood model of sequence evolution , 2004, Journal of Molecular Evolution.

[37]  Burkhard Morgenstern,et al.  DIALIGN: multiple DNA and protein sequence alignment at BiBiServ , 2004, Nucleic Acids Res..

[38]  M. Nei,et al.  Relationships between gene trees and species trees. , 1988, Molecular biology and evolution.

[39]  C. Simon,et al.  Accurate branch length estimation in partitioned Bayesian analyses requires accommodation of among-partition rate variation and attention to branch length priors. , 2006, Systematic biology.

[40]  J. Huelsenbeck,et al.  Bayesian phylogenetic analysis of combined data. , 2004, Systematic biology.

[41]  L. Kubatko Identifying hybridization events in the presence of coalescence via model selection. , 2009, Systematic biology.

[42]  Walter R. Gilks,et al.  Hypothesis testing and model selection , 1995 .

[43]  K. Katoh,et al.  MAFFT version 5: improvement in accuracy of multiple sequence alignment , 2005, Nucleic acids research.

[44]  R DeSalle,et al.  Multiple sources of character information and the phylogeny of Hawaiian drosophilids. , 1997, Systematic biology.

[45]  M. Suchard,et al.  Joint Bayesian estimation of alignment and phylogeny. , 2005, Systematic biology.

[46]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[47]  Erik L L Sonnhammer,et al.  Quality assessment of multiple alignment programs , 2002, FEBS letters.

[48]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[49]  M. P. Cummings,et al.  PAUP* Phylogenetic analysis using parsimony (*and other methods) Version 4 , 2000 .

[50]  Andrew Meade,et al.  Phylogenetic mixture models can reduce node-density artifacts. , 2008, Systematic biology.

[51]  W. Doolittle,et al.  Lateral genomics. , 1999, Trends in cell biology.

[52]  W. Maddison,et al.  Inferring phylogeny despite incomplete lineage sorting. , 2006, Systematic biology.

[53]  V. Barriel,et al.  [Molecular phylogenies and nucleotide insertion-deletion]. , 1994, Comptes rendus de l'Academie des sciences. Serie III, Sciences de la vie.

[54]  Mark P. Simmons,et al.  The relative performance of indel-coding methods in simulations. , 2007, Molecular phylogenetics and evolution.

[55]  S. Lanyon,et al.  DETECTING INTERNAL INCONSISTENCIES IN DISTANCE DATA , 1985 .

[56]  M. Rosenberg,et al.  Alignment and topological accuracy of the direct optimization approach via POY and traditional phylogenetics via ClustalW + PAUP*. , 2007, Systematic biology.

[57]  Toby Johnson,et al.  MCALIGN: stochastic alignment of noncoding DNA sequences based on an evolutionary model of sequence evolution. , 2004, Genome research.

[58]  D. Pearl,et al.  High-resolution species trees without concatenation , 2007, Proceedings of the National Academy of Sciences.

[59]  W. Wheeler OPTIMIZATION ALIGNMENT: THE END OF MULTIPLE SEQUENCE ALIGNMENT IN PHYLOGENETICS? , 1996 .

[60]  Noah A Rosenberg,et al.  Gene tree discordance, phylogenetic inference and the multispecies coalescent. , 2009, Trends in ecology & evolution.

[61]  John E McCormack,et al.  Maximum likelihood estimates of species trees: how accuracy of phylogenetic inference depends upon the divergence history and sampling design. , 2009, Systematic biology.

[62]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[63]  Jeremy M. Brown,et al.  When trees grow too long: investigating the causes of highly inaccurate bayesian branch-length estimates. , 2010, Systematic biology.

[64]  M. S. Lee,et al.  Partitioned likelihood support and the evaluation of data set conflict. , 2003, Systematic biology.

[65]  Iain M. Wallace,et al.  M-Coffee: combining multiple sequence alignment methods with T-Coffee , 2006, Nucleic acids research.

[66]  T. Reeder,et al.  Phylogenetic Affinities of the Rare and Enigmatic Limb-Reduced Anelytropsis (Reptilia: Squamata) as Inferred with Mitochondrial 16S rRNA Sequence Data , 2008 .

[67]  K. Nixon The Parsimony Ratchet, a New Method for Rapid Parsimony Analysis , 1999 .

[68]  D. Pearl,et al.  Estimating species phylogenies using coalescence times among sequences. , 2009, Systematic biology.

[69]  P. Goloboff Analyzing Large Data Sets in Reasonable Times: Solutions for Composite Optima , 1999, Cladistics : the international journal of the Willi Hennig Society.

[70]  B. Larget,et al.  Bayesian estimation of concordance among gene trees. , 2006, Molecular biology and evolution.

[71]  M. Siddall,et al.  Probabilism and Phylogenetic Inference , 1997, Cladistics : the international journal of the Willi Hennig Society.

[72]  L Lacey Knowles,et al.  Estimating species trees: methods of phylogenetic analysis when there is incongruence across genes. , 2009, Systematic biology.

[73]  Pablo A. Goloboff,et al.  TNT, a free program for phylogenetic analysis , 2008 .

[74]  Andrew Meade,et al.  Mixture models in phylogenetic inference , 2007, Mathematics of Evolution and Phylogeny.

[75]  H. Philippe,et al.  Heterotachy, an important process of protein evolution. , 2002, Molecular biology and evolution.

[76]  Frédéric Delsuc,et al.  Heterotachy and long-branch attraction in phylogenetics , 2005, BMC Evolutionary Biology.

[77]  Bernard M. E. Moret,et al.  Phylogenetic Inference , 2011, Encyclopedia of Parallel Computing.

[78]  W C Wheeler,et al.  On gaps. , 1999, Molecular phylogenetics and evolution.

[79]  V. Barriel,et al.  Phylogénies moléculaires et insertions-délétions de nucléotides , 1994 .

[80]  M. Rosenberg,et al.  How should gaps be treated in parsimony? A comparison of approaches using simulation. , 2007, Molecular phylogenetics and evolution.

[81]  Timothy J. Harlow,et al.  Searching for convergence in phylogenetic Markov chain Monte Carlo. , 2006, Systematic biology.

[82]  Liang Liu,et al.  BEST: Bayesian estimation of species trees under the coalescent model , 2008, Bioinform..

[83]  M. Pagel,et al.  A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. , 2004, Systematic biology.

[84]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[85]  F. Blattner,et al.  Mauve: multiple alignment of conserved genomic sequence with rearrangements. , 2004, Genome research.

[86]  Joshua M. Stuart,et al.  Genome 10K: a proposal to obtain whole-genome sequence for 10,000 vertebrate species. , 2009, The Journal of heredity.

[87]  István Miklós,et al.  StatAlign: an extendable software package for joint Bayesian estimation of alignments and evolutionary trees , 2008, Bioinform..

[88]  S. Edwards IS A NEW AND GENERAL THEORY OF MOLECULAR SYSTEMATICS EMERGING? , 2009, Evolution; international journal of organic evolution.

[89]  Pablo A Goloboff,et al.  On divide-and-conquer strategies for parsimony analysis of large data sets: Rec-I-DCM3 versus TNT. , 2007, Systematic biology.

[90]  Benjamin D. Redelings,et al.  BAli-Phy: simultaneous Bayesian inference of alignment and phylogeny , 2006, Bioinform..

[91]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[92]  Jeremy M. Brown,et al.  The importance of data partitioning and the utility of Bayes factors in Bayesian phylogenetics. , 2007, Systematic biology.

[93]  John P. Huelsenbeck,et al.  MrBayes 3: Bayesian phylogenetic inference under mixed models , 2003, Bioinform..

[94]  Teven,et al.  Philosophy and Phylogenetic Inference: A Comparison of Likelihood and Parsimony Methods in the Context of Karl Popper's Writings on Corroboration , 2001 .

[95]  Laura Salter Kubatko,et al.  STEM: species tree estimation using maximum likelihood for gene trees under coalescence , 2009, Bioinform..

[96]  J. Huelsenbeck,et al.  Bayesian phylogenetic model selection using reversible jump Markov chain Monte Carlo. , 2004, Molecular biology and evolution.

[97]  István Miklós,et al.  Bayesian coestimation of phylogeny and sequence alignment , 2005, BMC Bioinformatics.

[98]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[99]  W. Wheeler,et al.  POY version 4: phylogenetic analysis using dynamic homologies , 2010, Cladistics : the international journal of the Willi Hennig Society.

[100]  Mark P. Simmons,et al.  Independence of alignment and tree search. , 2004, Molecular phylogenetics and evolution.

[101]  Adrian E. Raftery,et al.  Hypothesis testing and model selection , 1996 .

[102]  G. Purschke,et al.  Phylogeny of Eunicida (Annelida) and exploring data congruence using a partition addition bootstrap alteration (PABA) approach. , 2006, Systematic biology.

[103]  Mark P. Simmons,et al.  Gaps as characters in sequence-based phylogenetic analyses. , 2000, Systematic biology.

[104]  R. Page,et al.  From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem. , 1997, Molecular phylogenetics and evolution.

[105]  K. Kjer,et al.  Opinions on multiple sequence alignment, and an empirical comparison of repeatability and accuracy between POY and structural alignment. , 2007, Systematic biology.

[106]  A. Mooers,et al.  The phylogeny of the subgroups within the melanogaster species group: likelihood tests on COI and COII sequences and a Bayesian estimate of phylogeny. , 2005, Molecular phylogenetics and evolution.

[107]  John Healy,et al.  GapCoder automates the use of indel characters in phylogenetic analysis , 2003, BMC Bioinformatics.

[108]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[109]  D. Swofford PAUP*: Phylogenetic analysis using parsimony (*and other methods), Version 4.0b10 , 2002 .

[110]  M. Suchard,et al.  Incorporating indel information into phylogeny estimation for rapidly emerging pathogens , 2007, BMC Evolutionary Biology.