Multiple sequence threading: an analysis of alignment quality and stability.

Methods that compare a protein sequence directly to a structure can be divided into those that construct a molecular model (threading methods) and those that perform a sequence alignment with the structure encoded as a sequence of structural states (one-dimensional/three-dimensional (1D/3D) matching). The former take into account the internal packing of the molecule but the latter do not. On the other hand, it is simple to include multiple sequence data in a 1D/3D comparison but difficult in a threading method. Here, a protein sequence/structure alignment method is described that uses a combination of matching predicted and observed residue exposure, predicted and observed secondary structure (1D/3D) together with pairwise packing interactions in the core (threading). Using a variety of distantly related and analogous protein structures, the multiple sequence threading (MST) method was compared to a single sequence threading (SST) method (that uses complex potentials of mean-force) and also to a multiple sequence alignment (MSA) program. It was found that the MST method produced alignments that were better than the best that could be obtained with either the SST or MSA method. The method was found to be stable to error in both secondary structure prediction and predicted exposure and also under variation of the key parameters (fully described in an Appendix). The contribution of the pairwise term was found to be small but without it, the correct alignments were less stable and structurally unreasonable deletions were observed when matching against larger structures. Using the parameters derived for alignment, the method was able to recognise related folds in the structure databank with a specificity comparable to other methods.

[1]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[2]  J. Garnier,et al.  Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. , 1978, Journal of molecular biology.

[3]  Georg E. Schulz,et al.  Principles of Protein Structure , 1979 .

[4]  W. Taylor,et al.  Identification of protein sequence homology by consensus template alignment. , 1986, Journal of molecular biology.

[5]  K. Nishikawa,et al.  Radial locations of amino acid residues in a globular protein: correlation with the sequence. , 1986, Journal of biochemistry.

[6]  M. Sternberg,et al.  Prediction of protein secondary structure and active sites using the alignment of homologous sequences. , 1987, Journal of molecular biology.

[7]  I. Crawford,et al.  Prediction of secondary structure by evolutionary comparison: Application to the α subunit of tryptophan synthase , 1987, Proteins.

[8]  W R Taylor,et al.  A holistic approach to protein structure alignment. , 1989, Protein engineering.

[9]  W R Taylor,et al.  Protein structure alignment. , 1989, Journal of molecular biology.

[10]  M. Geisow,et al.  Protein sequencing : a practical approach , 1989 .

[11]  N. D. Clarke,et al.  Identification of protein folds: Matching hydrophobicity patterns of sequence sets with solvent accessibility patterns of known structures , 1990, Proteins.

[12]  C. Orengo,et al.  A rapid method of protein structure alignment. , 1990, Journal of theoretical biology.

[13]  A M Lesk,et al.  Comparison of the structures of globins and phycocyanins: Evidence for evolutionary relationship , 1990, Proteins.

[14]  M. Sippl Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. , 1990, Journal of molecular biology.

[15]  A. D. McLachlan,et al.  Secondary structure‐based profiles: Use of structure‐conserving scoring tables in searching protein sequence databases for structural similarities , 1991, Proteins.

[16]  W R Taylor,et al.  Visualization of structural similarity in proteins. , 1991, Journal of molecular graphics.

[17]  P. Wolynes,et al.  Generalized protein tertiary structure recognition using associative memory Hamiltonians. , 1991, Journal of molecular biology.

[18]  W R Taylor,et al.  Towards protein tertiary fold prediction using distance and motif constraints. , 1991, Protein engineering.

[19]  K. B. Ward,et al.  Crepe-ribbon representation for protein structures: comparison of phospholipases A2. , 1991, Journal of molecular graphics.

[20]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[21]  W R Taylor,et al.  Fast structure alignment for protein databank searching , 1992, Proteins.

[22]  S. Bryant,et al.  An empirical energy function for threading protein sequence through the folding motif , 1993, Proteins.

[23]  Y. Matsuo,et al.  Development of pseudoenergy potentials for assessing protein 3-D-1-D compatibility and detecting weak homologies. , 1993, Protein engineering.

[24]  P. Argos,et al.  Quantification of secondary structure prediction improvement using multiple alignments. , 1993, Protein engineering.

[25]  W R Taylor,et al.  Protein fold refinement: building models from idealized folds using motif constraints and multiple sequence data. , 1993, Protein engineering.

[26]  William R. Taylor,et al.  Motif-Biased Protein Sequence Alignment , 1994, J. Comput. Biol..

[27]  J. Carey,et al.  Six new candidate members of the α/β twisted open‐sheet family detected by sequence similarity to flavodoxin , 1994, Protein science : a publication of the Protein Society.

[28]  M. A. McClure,et al.  Comparative analysis of multiple protein-sequence alignment methods. , 1994, Molecular biology and evolution.

[29]  W. Taylor,et al.  Secondary structure formation in model polypeptide chains. , 1994, Protein engineering.

[30]  T. P. Flores,et al.  Multiple protein structure alignment , 1994, Protein science : a publication of the Protein Society.

[31]  W R Taylor,et al.  An investigation of conservation-biased gap-penalties for multiple protein sequence alignment. , 1995, Gene.

[32]  W. Taylor,et al.  Global fold determination from a small number of distance restraints. , 1995, Journal of molecular biology.

[33]  S. Bryant,et al.  Threading a database of protein cores , 1995, Proteins.

[34]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[35]  J M Thornton,et al.  Successful protein fold recognition by optimal sequence threading validated by rigorous blind testing , 1995, Proteins.

[36]  B Rost,et al.  Progress of 1D protein structure prediction at last , 1995, Proteins.

[37]  S. Wodak,et al.  Protein structure prediction by threading methods: Evaluation of current techniques , 1995, Proteins.

[38]  Burkhard Rost,et al.  TOPITS: Threading One-Dimensional Predictions Into Three-Dimensional Structures , 1995, ISMB.

[39]  S. Bryant Evaluation of threading specificity and accuracy , 1996, Proteins.

[40]  G. Barton,et al.  Protein fold recognition by mapping predicted secondary structures. , 1996, Journal of molecular biology.

[41]  W R Taylor,et al.  Multiple sequence threading: conditional gap placement. , 1997, Folding & design.

[42]  J M Thornton,et al.  Protein structure prediction. , 1998, Current opinion in biotechnology.