AlignStat: a web-tool and R package for statistical comparison of alternative multiple sequence alignments

BackgroundAlternative sequence alignment algorithms yield different results. It is therefore useful to quantify the similarities and differences between alternative alignments of the same sequences. These measurements can identify regions of consensus that are likely to be most informative in downstream analysis. They can also highlight systematic differences between alignments that relate to differences in the alignment algorithms themselves.ResultsHere we present a simple method for aligning two alternative multiple sequence alignments to one another and assessing their similarity. Differences are categorised into merges, splits or shifts in one alignment relative to the other. A set of graphical visualisations allow for intuitive interpretation of the data.ConclusionsAlignStat enables the easy one-off online use of MSA similarity comparisons or into R pipelines. The web-tool is available at AlignStat.Science.LaTrobe.edu.au. The R package, readme and example data are available on CRAN and GitHub.com/TS404/AlignStat.

[1]  Dirk Eddelbuettel,et al.  Rcpp: Seamless R and C++ Integration , 2011 .

[2]  M. Rosenberg,et al.  Multiple sequence alignment accuracy and phylogenetic inference. , 2006, Systematic biology.

[3]  Olivier Poch,et al.  A comprehensive comparison of multiple sequence alignment programs , 1999, Nucleic Acids Res..

[4]  Etsuko N. Moriyama,et al.  SuiteMSA: visual tools for multiple sequence alignment comparison and molecular sequence simulation , 2011, BMC Bioinformatics.

[5]  K. Crandall,et al.  Incorporating gaps as phylogenetic characters across eight DNA regions: ramifications for North American Psoraleeae (Leguminosae). , 2008, Molecular phylogenetics and evolution.

[6]  Marilyn A. Anderson,et al.  Structural homology guided alignment of cysteine rich proteins , 2016, SpringerPlus.

[7]  A. Gibbs,et al.  The Diagram, a Method for Comparing Sequences , 1970 .

[8]  Robert C. Edgar,et al.  Multiple sequence alignment. , 2006, Current opinion in structural biology.

[9]  Ziheng Yang,et al.  INDELible: A Flexible Simulator of Biological Sequence Evolution , 2009, Molecular biology and evolution.

[10]  Muhammad Shoaib,et al.  MQAT: An Efficient Quality Assessment Tool for Large Multiple Sequence Alignments , 2013 .

[11]  D. Higgins,et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.

[12]  Tal Pupko,et al.  An alignment confidence score capturing robustness to guide tree uncertainty. , 2010, Molecular biology and evolution.

[13]  Olivier Poch,et al.  BAliBASE 3.0: Latest developments of the multiple sequence alignment benchmark , 2005, Proteins.

[14]  Andrew D. Smith,et al.  SIMPROT: Using an empirically determined indel distribution in simulations of protein evolution , 2005, BMC Bioinformatics.

[15]  Marilyn A. Anderson,et al.  The Defensins Consist of Two Independent, Convergent Protein Superfamilies. , 2016, Molecular biology and evolution.

[16]  J. M. Sauder,et al.  Large‐scale comparison of protein sequence alignment algorithms with structure alignments , 2000, Proteins.

[17]  Mark P. Simmons,et al.  Gaps as characters in sequence-based phylogenetic analyses. , 2000, Systematic biology.