MSARC: Multiple sequence alignment by residue clustering

BackgroundProgressive methods offer efficient and reasonably good solutions to the multiple sequence alignment problem. However, resulting alignments are biased by guide-trees, especially for relatively distant sequences.ResultsWe propose MSARC, a new graph-clustering based algorithm that aligns sequence sets without guide-trees. Experiments on the BAliBASE dataset show that MSARC achieves alignment quality similar to the best progressive methods.Furthermore, MSARC outperforms them on sequence sets whose evolutionary distances are difficult to represent by a phylogenetic tree. These datasets are most exposed to the guide-tree bias of alignments.AvailabilityMSARC is available at http://bioputer.mimuw.edu.pl/msarc

[1]  D. Higgins,et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.

[2]  Chuong B. Do,et al.  ProbCons: Probabilistic consistency-based multiple sequence alignment. , 2005, Genome research.

[3]  M. Suchard,et al.  Joint Bayesian estimation of alignment and phylogeny. , 2005, Systematic biology.

[4]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[5]  O. Gascuel,et al.  New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. , 2010, Systematic biology.

[6]  Bruce Hendrickson,et al.  A Multi-Level Algorithm For Partitioning Graphs , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[7]  G. Gonnet,et al.  Exhaustive matching of the entire protein sequence database. , 1992, Science.

[8]  István Miklós,et al.  Bayesian coestimation of phylogeny and sequence alignment , 2005, BMC Bioinformatics.

[9]  Peter F. Stadler,et al.  Stochastic pairwise alignments , 2002, ECCB.

[10]  Serita M. Nelesen,et al.  Rapid and Accurate Large-Scale Coestimation of Sequence Alignments and Phylogenetic Trees , 2009, Science.

[11]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[12]  K. Katoh,et al.  MAFFT version 5: improvement in accuracy of multiple sequence alignment , 2005, Nucleic acids research.

[13]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[14]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[15]  John D. Kececioglu,et al.  The Maximum Weight Trace Problem in Multiple Sequence Alignment , 1993, CPM.

[16]  A. Löytynoja,et al.  Phylogeny-Aware Gap Placement Prevents Errors in Sequence Alignment and Evolutionary Analysis , 2008, Science.

[17]  Olivier Poch,et al.  BAliBASE 3.0: Latest developments of the multiple sequence alignment benchmark , 2005, Proteins.

[18]  Michael Kaufmann,et al.  BMC Bioinformatics BioMed Central , 2005 .

[19]  Dennis R. Livesay,et al.  Probalign: multiple sequence alignment using partition function posterior probabilities , 2006, Bioinform..

[20]  Terence Hwa,et al.  Statistical Significance of Probabilistic Sequence Alignment and Related Local Hidden Markov Models , 2001, J. Comput. Biol..

[21]  Lior Pachter,et al.  Fast Statistical Alignment , 2009, PLoS Comput. Biol..

[22]  Yongchao Liu,et al.  MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities , 2010, Bioinform..

[23]  Michael Kaufmann,et al.  DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment , 2008, Algorithms for Molecular Biology.

[24]  Byung-Jun Yoon,et al.  PicXAA: greedy probabilistic construction of maximum expected accuracy alignment of multiple sequences , 2010, Nucleic acids research.

[25]  R. M. Mattheyses,et al.  A Linear-Time Heuristic for Improving Network Partitions , 1982, 19th Design Automation Conference.

[26]  S. Miyazawa A reliable sequence alignment method based on probabilities of residue correspondences. , 1995, Protein engineering.

[27]  M. Suchard,et al.  Alignment Uncertainty and Genomic Analysis , 2008, Science.