A structural pattern‐based method for protein fold recognition

A method (SPREK) was developed to evaluate the register of a sequence on a structure based on the matching of structural patterns against a library derived from the protein structure databank. The scores obtained were normalized against random background distributions derived from sequence shuffling and permutation methods. ‘Random’ structures were also used to evaluate the effectiveness of the method. These were generated by a simple random‐walk and a more sophisticated structure prediction method that produced protein‐like folds. For comparison with other methods, the performance of the method was assessed using collections of models including decoys and models from the CASP‐5 exercise. The performance of SPREK on the decoy models was equivalent to (and sometimes better than) those obtained with more complex approaches. An exception was the two smallest proteins, for which SPREK did not perform well due to a lack of patterns. Using the best parameter combination from trials on decoy models, the CASP models of intermediate difficulty were evaluated by SPREK and the quality of the top scoring model was evaluated by its CASP ranking. Of the 14 targets in this class, half lie in the top 10% (out of around 140 models for each target). The two worst rankings resulted from the selection by our method of a well‐packed model that was based on the wrong fold. Of the other poor rankings, one was the smallest protein and the others were the four largest (all over 250 residues). Proteins 2004. © 2004 Wiley‐Liss, Inc.

[1]  William R Taylor,et al.  A Fourier analysis of symmetry in protein structure. , 2002, Protein engineering.

[2]  I. Jonassen,et al.  Searching the protein structure databank with weak sequence patterns and structural constraints. , 2000, Journal of molecular biology.

[3]  W R Taylor,et al.  Defining linear segments in protein structure. , 2001, Journal of molecular biology.

[4]  M. Levitt,et al.  Energy functions that discriminate X-ray and near native folds from well-constructed decoys. , 1996, Journal of molecular biology.

[5]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[6]  Kuang Lin,et al.  Threading Using Neural nEtwork (TUNE): the measure of protein sequence-structure compatibility , 2002, Bioinform..

[7]  William R. Taylor,et al.  Ab initio modelling of the N-terminal domain of the secretin receptors , 2003, Comput. Biol. Chem..

[8]  S Vajda,et al.  Discrimination of near‐native protein structures from misfolded models by empirical free energy functions , 2000, Proteins.

[9]  W R Taylor,et al.  Towards protein tertiary fold prediction using distance and motif constraints. , 1991, Protein engineering.

[10]  T L Blundell,et al.  CAMPASS: a database of structurally aligned protein superfamilies. , 1998, Structure.

[11]  William R Taylor,et al.  Modelling zinc-binding proteins with GADGET: genetic algorithm and distance geometry for exploring topology. , 2003, Journal of molecular biology.

[12]  W. Taylor,et al.  Multiple sequence threading: an analysis of alignment quality and stability. , 1997, Journal of molecular biology.

[13]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[14]  W R Taylor,et al.  Protein fold refinement: building models from idealized folds using motif constraints and multiple sequence data. , 1993, Protein engineering.

[15]  Alexey G. Murzin,et al.  General architecture of the α-helical globule , 1988 .

[16]  W R Taylor,et al.  Protein structure alignment. , 1989, Journal of molecular biology.

[17]  W. Taylor,et al.  Identification of protein sequence homology by consensus template alignment. , 1986, Journal of molecular biology.

[18]  John P. Overington,et al.  HOMSTRAD: A database of protein structure alignments for homologous families , 1998, Protein science : a publication of the Protein Society.

[19]  M. Sippl Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. , 1990, Journal of molecular biology.

[20]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[21]  W R Taylor,et al.  A template based method of pattern matching in protein sequences. , 1989, Progress in biophysics and molecular biology.

[22]  C Sander,et al.  Prediction of protein structure by evaluation of sequence-structure fitness. Aligning sequences to contact profiles derived from three-dimensional structures. , 1993, Journal of molecular biology.

[23]  J. Skolnick,et al.  A distance‐dependent atomic knowledge‐based potential for improved protein structure selection , 2001, Proteins.

[24]  W. Taylor A flexible method to align large numbers of biological sequences , 2005, Journal of Molecular Evolution.