Evaluation of iterative alignment algorithms for multiple alignment

MOTIVATION Iteration has been used a number of times as an optimization method to produce multiple alignments, either alone or in combination with other methods. Iteration has a great advantage in that it is often very simple both in terms of coding the algorithms and the complexity of the time and memory requirements. In this paper, we systematically test several different iteration strategies by comparing the results on sets of alignment test cases. RESULTS We tested three schemes where iteration is used to improve an existing alignment. This was found to be remarkably effective and could induce a significant improvement in the accuracy of alignments from most packages. For example the average accuracy of ClustalW was improved by over 6% on the hardest test cases. Iteration was found to be even more powerful when it was directly incorporated into a progressive alignment scheme. Here, iteration was used to improve subalignments at each step of progressive alignment. The beneficial effects of iteration come, in part, from the ability to get round the usual local minimum problem with progressive alignment. This ability can also be used to help reduce the complexity of T-Coffee, without losing accuracy. Alignments can be generated, using T-Coffee, to align subgroups of sequences, which can then be iteratively improved and merged. AVAILABILITY All of the scripts are freely available on the web at http://www.bioinf.ucd.ie/people/iain/iteration.html CONTACT iain.wallace@ucd.ie.

[1]  J. D. Thompson,et al.  Towards a reliable objective function for multiple sequence alignments. , 2001, Journal of molecular biology.

[2]  Michael Brudno,et al.  PROBCONS: Probabilistic Consistency-Based Multiple Alignment of Amino Acid Sequences , 2004, AAAI.

[3]  D. Higgins,et al.  SAGA: sequence alignment by genetic algorithm. , 1996, Nucleic acids research.

[4]  John P. Overington,et al.  HOMSTRAD: A database of protein structure alignments for homologous families , 1998, Protein science : a publication of the Protein Society.

[5]  O. Gotoh Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. , 1996, Journal of molecular biology.

[6]  William R. Taylor,et al.  Multiple sequence alignment by a pairwise algorithm , 1987, Comput. Appl. Biosci..

[7]  Masato Ishikawa,et al.  Comprehensive study on iterative algorithms of multiple sequence alignment , 1995, Comput. Appl. Biosci..

[8]  Kevin Karplus,et al.  Evaluation of protein multiple alignments by SAM-T99 using the BAliBASE multiple alignment test set , 2001, Bioinform..

[9]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[10]  Amitava Datta,et al.  Multiple sequence alignment in parallel on a workstation cluster , 2004, Bioinform..

[11]  Olivier Poch,et al.  RASCAL: Rapid Scanning and Correction of Multiple Sequence Alignments , 2003, Bioinform..

[12]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[13]  M. Sternberg,et al.  A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. , 1987, Journal of molecular biology.

[14]  Sandeep K. Gupta,et al.  Improving the Practical Space and Time Efficiency of the Shortest-Paths Approach to Sum-of-Pairs Multiple Sequence Alignment , 1995, J. Comput. Biol..

[15]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[16]  D. Lipman,et al.  Rapid similarity searches of nucleic acid and protein data banks. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[17]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[18]  Christopher J. Lee Generating Consensus Sequences from Partial Order Multiple Sequence Alignment Graphs , 2003, Bioinform..

[19]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[20]  Ralf Zimmer,et al.  Improving Profile-Profile Alignments via Log Average Scoring , 2001, WABI.