Phylogeny-Aware Alignment with PRANK and PAGAN.

Evolutionary analyses require sequence alignments that correctly represent evolutionary homology. Evolutionary homology and proteins' structural similarity are not the same and sequence alignments generated with methods designed for structural matching can be seriously misleading in comparative and phylogenetic analyses. The phylogeny-aware alignment algorithm implemented in the program PRANK has been shown to produce good alignments for evolutionary inferences. Unlike other alignment programs, PRANK makes use of phylogenetic information to distinguish alignment gaps caused by insertions or deletions and, thereafter, handles the two types of events differently. As a by-product of the correct handling of insertions and deletions, PRANK can provide the inferred ancestral sequences as a part of the output and mark the alignment gaps differently depending on their origin in insertion or deletion events. As the algorithm infers the evolutionary history of the sequences, PRANK can be sensitive to errors in the guide phylogeny and violations on the underlying assumptions about the origin and patterns of gaps. To mitigate the effects of such model violations, the phylogeny-aware alignment algorithm has been re-implemented in program PAGAN. By using sequence graphs, PAGAN can model and accumulate evidence from more complex gap structures than PRANK does, and incorporate this uncertainty in the inferred ancestral sequences. These issues are discussed in detail below and practical advice is provided for the use of PRANK and PAGAN in evolutionary analysis. The two software packages can be downloaded from http://wasabiapp.org/software .

[1]  A. Löytynoja,et al.  Co-estimation of Phylogeny-aware Alignment and Phylogenetic Tree , 2016, bioRxiv.

[2]  M. Gil,et al.  Phylogenetic assessment of alignments reveals neglected tree signal in gaps , 2010, Genome Biology.

[3]  P. Hogeweg,et al.  The alignment of sets of sequences and the construction of phyletic trees: An integrated method , 2005, Journal of Molecular Evolution.

[4]  R. Doolittle,et al.  Progressive sequence alignment as a prerequisitetto correct phylogenetic trees , 2007, Journal of Molecular Evolution.

[5]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[6]  Simon Whelan,et al.  Measuring the distance between multiple sequence alignments , 2012, Bioinform..

[7]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[8]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[9]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[10]  Dan Graur,et al.  Heads or tails: a simple reliability check for multiple sequence alignments. , 2007, Molecular biology and evolution.

[11]  Alan Medlar,et al.  Wasabi: An Integrated Platform for Evolutionary Sequence Analysis and Data Visualization. , 2016, Molecular biology and evolution.

[12]  A. Löytynoja,et al.  Phylogeny-Aware Gap Placement Prevents Errors in Sequence Alignment and Evolutionary Analysis , 2008, Science.

[13]  Kazutaka Katoh,et al.  Multiple alignment of DNA sequences with MAFFT. , 2009, Methods in molecular biology.

[14]  Asif U. Tamuri,et al.  Alignment Modulates Ancestral Sequence Reconstruction Accuracy , 2018, Molecular biology and evolution.

[15]  Paramvir S. Dehal,et al.  FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments , 2010, PloS one.

[16]  Aleksey Y Ogurtsov,et al.  Indel-based evolutionary distance and mouse-human divergence. , 2004, Genome research.

[17]  D. Sankoff Minimal Mutation Trees of Sequences , 1975 .

[18]  István Miklós,et al.  StatAlign: an extendable software package for joint Bayesian estimation of alignments and evolutionary trees , 2008, Bioinform..

[19]  Ziheng Yang,et al.  The effect of insertions, deletions, and alignment errors on the branch-site test of positive selection. , 2010, Molecular biology and evolution.

[20]  Benjamin D. Redelings,et al.  BAli-Phy: simultaneous Bayesian inference of alignment and phylogeny , 2006, Bioinform..

[21]  Nick Goldman,et al.  The effects of alignment error and alignment filtering on the sitewise detection of positive selection. , 2012, Molecular biology and evolution.

[22]  Ari Löytynoja,et al.  An algorithm for progressive multiple alignment of sequences with insertions. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Albert J. Vilella,et al.  Accurate extension of multiple sequence alignments using a phylogeny-aware graph algorithm , 2012, Bioinform..

[24]  Serita M. Nelesen,et al.  Rapid and Accurate Large-Scale Coestimation of Sequence Alignments and Phylogenetic Trees , 2009, Science.

[25]  Christopher J. Lee,et al.  Multiple sequence alignment using partial order graphs , 2002, Bioinform..