BMC Bioinformatics BioMed Central

BackgroundWe present a complete re-implementation of the segment-based approach to multiple protein alignment that contains a number of improvements compared to the previous version 2.2 of DIALIGN. This previous version is superior to Needleman-Wunsch-based multi-alignment programs on locally related sequence sets. However, it is often outperformed by these methods on data sets with global but weak similarity at the primary-sequence level.ResultsIn the present paper, we discuss strengths and weaknesses of DIALIGN in view of the underlying objective function. Based on these results, we propose several heuristics to improve the segment-based alignment approach. For pairwise alignment, we implemented a fragment-chaining algorithm that favours chains of low-scoring local alignments over isolated high-scoring fragments. For multiple alignment, we use an improved greedy procedure that is less sensitive to spurious local sequence similarities. To evaluate our method on globally related protein families, we used the well-known database BAliBASE. For benchmarking tests on locally related sequences, we created a new reference database called IRMBASE which consists of simulated conserved motifs implanted into non-related random sequences.ConclusionOn BAliBASE, our new program performs significantly better than the previous version of DIALIGN and is comparable to the standard global aligner CLUSTAL W, though it is outperformed by some newly developed programs that focus on global alignment. On the locally related test sets in IRMBASE, our method outperforms all other programs that we evaluated.

[1]  Benjamin J. Raphael,et al.  A novel method for multiple alignment of sequences with repeated and shuffled elements. , 2004, Genome research.

[2]  Erik L L Sonnhammer,et al.  Quality assessment of multiple alignment programs , 2002, FEBS letters.

[3]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[4]  Lode Wyns,et al.  SABmark- a benchmark for sequence alignment that covers the entire known fold space , 2005, Bioinform..

[5]  Ernest Feytmans,et al.  MATCH-BOX: a fundamentally new algorithm for the simultaneous alignment of several protein sequences , 1992, Comput. Appl. Biosci..

[6]  Burkhard Morgenstern,et al.  DIALIGN: multiple DNA and protein sequence alignment at BiBiServ , 2004, Nucleic Acids Res..

[7]  Inna Dubchak,et al.  Glocal alignment: finding rearrangements during alignment , 2003, ISMB.

[8]  Burkhard Morgenstern,et al.  DIALIGN2: Improvement of the segment to segment approach to multiple sequence alignment , 1999, German Conference on Bioinformatics.

[9]  Sean R. Eddy,et al.  Biological sequence analysis: Contents , 1998 .

[10]  Olivier Poch,et al.  BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs , 1999, Bioinform..

[11]  Lode Wyns,et al.  Align-m-a new algorithm for multiple alignment of highly divergent sequences , 2004, Bioinform..

[12]  Michael Brudno,et al.  Fast and sensitive multiple alignment of large genomic sequences , 2003, BMC Bioinformatics.

[13]  Michael Brudno,et al.  PROBCONS: Probabilistic Consistency-Based Multiple Alignment of Amino Acid Sequences , 2004, AAAI.

[14]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[15]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[16]  T. Speed,et al.  Biological Sequence Analysis , 1998 .

[17]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[18]  M S Waterman,et al.  Multiple sequence alignment by consensus. , 1986, Nucleic acids research.

[19]  Jens Stoye,et al.  Benchmarking tools for the alignment of functional noncoding DNA , 2004, BMC Bioinformatics.

[20]  Folker Meyer,et al.  Rose: generating sequence families , 1998, Bioinform..

[21]  Burkhard Morgenstern,et al.  A simple and space-efficient fragment-chaining algorithm for alignment of DNA and protein sequences , 2002, Appl. Math. Lett..

[22]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[23]  Christopher J. Lee,et al.  Multiple sequence alignment using partial order graphs , 2002, Bioinform..

[24]  Burkhard Morgenstern,et al.  Speeding Up the DIALIGN Multiple Alignment Program by Using the 'Greedy Alignment of BIOlogical Sequences LIBrary' (GABIOS-LIB) , 2000, JOBIM.

[25]  R. Guigó,et al.  An assessment of gene prediction accuracy in large DNA sequences. , 2000, Genome research.

[26]  Olivier Poch,et al.  A comprehensive comparison of multiple sequence alignment programs , 1999, Nucleic Acids Res..

[27]  O. Gotoh Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. , 1996, Journal of molecular biology.

[28]  F. Corpet Multiple sequence alignment with hierarchical clustering. , 1988, Nucleic acids research.

[29]  A. Dress,et al.  Multiple DNA and protein sequence alignment based on segment-to-segment comparison. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Gajendra P. S. Raghava,et al.  OXBench: A benchmark for evaluation of protein multiple sequence alignment accuracy , 2003, BMC Bioinformatics.