Reconstruction of Genuine Pair-Wise Sequence Alignment

In many applications, the algorithmically obtained alignment ideally should restore the "golden standard" (GS) alignment, which superimposes positions originating from the same position of the common ancestor of the compared sequences. The average similarity between the algorithmically obtained and GS alignments ("the quality") is an important characteristic of an alignment algorithm. We proposed to determine the quality of an algorithm, using sequences that were artificially generated in accordance with an appropriate evolution model; the approach was applied to the global version of the Smith-Waterman algorithm (SWA). The quality of SWA is between 97% (for a PAM distance of 60) and 70% (for a PAM distance of 300). The percentage of identical aligned residues is the same for algorithmic and GS alignments. The total length of indels in algorithmic alignments is less than in the GS-mainly due to a substantial decrease in the number of indels in algorithmic alignments.

[1]  M. Vingron,et al.  Quantifying the local reliability of a sequence alignment. , 1996, Protein engineering.

[2]  P. Argos,et al.  An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited. , 1995, Journal of molecular biology.

[3]  A. Finkelstein,et al.  From analysis of protein structural alignments toward a novel approach to align protein sequences , 2004, Proteins.

[4]  M J Sippl,et al.  Structure-based evaluation of sequence comparison and fold recognition alignment accuracy. , 2000, Journal of molecular biology.

[5]  Folker Meyer,et al.  Rose: generating sequence families , 1998, Bioinform..

[6]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[7]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[8]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[9]  William R. Pearson,et al.  Empirical determination of effective gap penalties for sequence comparison , 2002, Bioinform..

[10]  I. I. Litvinov,et al.  Information on the secondary structure improves the quality of protein sequence alignment , 2006, Molecular Biology.

[11]  G. Gonnet,et al.  Empirical and structural models for insertions and deletions in the divergent evolution of proteins. , 1993, Journal of molecular biology.

[12]  Ronald M. Levy,et al.  Iterative sequence/secondary structure search for protein homologs: comparison with amino acid sequence alignments and application to fold recognition in genome databases , 2000, Bioinform..

[13]  Lars Arvestad,et al.  Assessment of protein distance measures and tree-building methods for phylogenetic tree reconstruction. , 2005, Molecular biology and evolution.

[14]  E. Koonin,et al.  A universal trend of amino acid gain and loss in protein evolution , 2005, Nature.

[15]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[16]  Maximilian Schlosshauer,et al.  A novel approach to local reliability of sequence alignments , 2002, Bioinform..

[17]  R. Doolittle Similar amino acid sequences: chance or common ancestry? , 1981, Science.

[18]  P. Argos,et al.  Determination of reliable regions in protein sequence alignments. , 1990, Protein engineering.

[19]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.