Protein Fold Recognition using Residue-Based Alignments of Sequence and Secondary Structure

Protein structure prediction aims to determine the three-dimensional structure of proteins form their amino acid sequences. When a protein does not have similarity (homology) to any known fold, threading or fold recognition methods are used to predict structure. Fold recognition methods frequently employ secondary structure, solvent accessibility, and evolutionary information to enhance the accuracy and the quality of the predictions. In this paper, we present a residue based alignment method as an alternative to the state-of-the-art SSEA method, originally introduced by Przytycka et al., and further modified by McGuffin et al. We introduce a residue-based score function, which can incorporate amino acid similarity matrices such as BLOSUM into secondary structure similarity scoring and compute joint alignments. We show that the power of the SSEA method comes from the length normalization instead of the element alignment technique and similar performance can be achieved using residue-based alignments of secondary structures by optimizing gap costs. In simulations with the two benchmark datasets, our method performs slightly better than the SSEA in terms of the fold recognition accuracy. When the secondary structure similarity matrix is combined with the amino acid based BLOSUM30 matrix, the accuracy of our method improves further (4% for the McGuffin set and 10% for the Ding and Dubchak set). The availability of aligning the amino acid and secondary structure sequences in a joint manner offers a better starting point for more elaborate techniques that employ profile-profile alignments and machine learning methods.

[1]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[2]  Ralf Zimmer,et al.  Combining Secondary Structure Element Alignment and Profile-Profile Alignment for Fold Recognition , 2004, German Conference on Bioinformatics.

[3]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[4]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[5]  Liam J. McGuffin,et al.  What are the baselines for protein fold recognition? , 2001, Bioinform..

[6]  George D. Rose,et al.  A protein taxonomy based on secondary structure , 1999, Nature Structural Biology.

[7]  Silvio C. E. Tosatto,et al.  MANIFOLD: protein fold recognition based on secondary structure, sequence similarity and enzyme classification. , 2003, Protein engineering.

[8]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[9]  Liam J McGuffin,et al.  Targeting novel folds for structural genomics , 2002, Proteins.

[10]  Ronald M. Levy,et al.  Iterative sequence/secondary structure search for protein homologs: comparison with amino acid sequence alignments and application to fold recognition in genome databases , 2000, Bioinform..

[11]  Ziding Zhang,et al.  Descriptor‐based protein remote homology identification , 2005, Protein science : a publication of the Protein Society.

[12]  S F Altschul,et al.  Local alignment statistics. , 1996, Methods in enzymology.