Iterative refinement of structure-based sequence alignments by Seed Extension

BackgroundAccurate sequence alignment is required in many bioinformatics applications but, when sequence similarity is low, it is difficult to obtain accurate alignments based on sequence similarity alone. The accuracy improves when the structures are available, but current structure-based sequence alignment procedures still mis-align substantial numbers of residues. In order to correct such errors, we previously explored the possibility of replacing the residue-based dynamic programming algorithm in structure alignment procedures with the Seed Extension algorithm, which does not use a gap penalty. Here, we describe a new procedure called RSE (Refinement with Seed Extension) that iteratively refines a structure-based sequence alignment.ResultsRSE uses SE (Seed Extension) in its core, which is an algorithm that we reported recently for obtaining a sequence alignment from two superimposed structures. The RSE procedure was evaluated by comparing the correctly aligned fractions of residues before and after the refinement of the structure-based sequence alignments produced by popular programs. CE, DaliLite, FAST, LOCK2, MATRAS, MATT, TM-align, SHEBA and VAST were included in this analysis and the NCBI's CDD root node set was used as the reference alignments. RSE improved the average accuracy of sequence alignments for all programs tested when no shift error was allowed. The amount of improvement varied depending on the program. The average improvements were small for DaliLite and MATRAS but about 5% for CE and VAST. More substantial improvements have been seen in many individual cases. The additional computation times required for the refinements were negligible compared to the times taken by the structure alignment programs.ConclusionRSE is a computationally inexpensive way of improving the accuracy of a structure-based sequence alignment. It can be used as a standalone procedure following a regular structure-based sequence alignment or to replace the traditional iterative refinement procedures based on residue-level dynamic programming algorithm in many structure alignment programs.

[1]  B Honig,et al.  An integrated approach to the analysis and modeling of protein sequences and structures. I. Protein structural alignment and a quantitative measure for protein structural distance. , 2000, Journal of molecular biology.

[2]  Douglas L. Brutlag,et al.  FoldMiner and LOCK 2: protein structure comparison and motif discovery on the web , 2004, Nucleic Acids Res..

[3]  A. Sali,et al.  Alignment of protein sequences by their profiles , 2004, Protein science : a publication of the Protein Society.

[4]  Changhoon Kim,et al.  SE: an algorithm for deriving sequence alignment from a pair of superimposed structures , 2009, BMC Bioinformatics.

[5]  Michael J E Sternberg,et al.  Exploring the extremes of sequence/structure space with ensemble fold recognition in the program Phyre , 2008, Proteins.

[6]  Lenore Cowen,et al.  Matt: Local Flexibility Aids Protein Multiple Structure Alignment , 2008, PLoS Comput. Biol..

[7]  William R. Taylor,et al.  Structure Comparison and Structure Patterns , 2000, J. Comput. Biol..

[8]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[9]  Zhiping Weng,et al.  FAST: A novel protein structure alignment algorithm , 2004, Proteins.

[10]  Changhoon Kim,et al.  Accuracy of structure-based sequence alignment of automatic methods , 2007, BMC Bioinformatics.

[11]  Anna R. Panchenko,et al.  Erratum to: State of the art: refinement of multiple sequence alignments , 2006, BMC Bioinformatics.

[12]  Erik L. L. Sonnhammer,et al.  Automatic assessment of alignment quality , 2005, Nucleic acids research.

[13]  A. Godzik The structural alignment between two proteins: Is there a unique answer? , 1996, Protein science : a publication of the Protein Society.

[14]  Narmada Thanki,et al.  CDD: a conserved domain database for interactive domain family analysis , 2006, Nucleic Acids Res..

[15]  G. Barton,et al.  Multiple protein sequence alignment from tertiary structure comparison: Assignment of global and residue confidence levels , 1992, Proteins.

[16]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[17]  W. Kabsch A solution for the best rotation to relate two sets of vectors , 1976 .

[18]  Adam Godzik,et al.  FATCAT: a web server for flexible structure comparison and structure similarity searching , 2004, Nucleic Acids Res..

[19]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[20]  An-Suei Yang,et al.  Structure-dependent sequence alignment for remotely related proteins , 2002, Bioinform..

[21]  B. Honig,et al.  An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments. , 2000, Journal of molecular biology.

[22]  Byungkook Lee,et al.  Context‐specific amino acid substitution matrices and their use in the detection of protein homologs , 2008, Proteins.

[23]  Peter Lackner,et al.  Comparative Analysis of Protein Structure Alignments , 2007, BMC Structural Biology.

[24]  Jean-Christophe Gelly,et al.  EvDTree: structure-dependent substitution profiles based on decision tree classification of 3D environments , 2005, BMC Bioinformatics.

[25]  J. Jung,et al.  Protein structure alignment using environmental profiles. , 2000, Protein engineering.

[26]  W. Kabsch A discussion of the solution for the best rotation to relate two sets of vectors , 1978 .

[27]  Takeshi Kawabata,et al.  MATRAS: a program for protein 3D structure comparison , 2003, Nucleic Acids Res..

[28]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[29]  M Levitt,et al.  Comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins , 1998, Protein science : a publication of the Protein Society.

[30]  W. Pearson Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. , 1991, Genomics.

[31]  S. Bryant,et al.  Threading a database of protein cores , 1995, Proteins.

[32]  Liisa Holm,et al.  DaliLite workbench for protein structure comparison , 2000, Bioinform..