Combining evolutionary and structural information for local protein structure prediction

We study the effects of various factors in representing and combining evolutionary and structural information for local protein structural prediction based on fragment selection. We prepare databases of fragments from a set of non‐redundant protein domains. For each fragment, evolutionary information is derived from homologous sequences and represented as estimated effective counts and frequencies of amino acids (evolutionary frequencies) at each position. Position‐specific amino acid preferences called structural frequencies are derived from statistical analysis of discrete local structural environments in database structures. Our method for local structure prediction is based on ranking and selecting database fragments that are most similar to a target fragment. Using secondary structure type as a local structural property, we test our method in a number of settings. The major findings are: (1) the COMPASS‐type scoring function for fragment similarity comparison gives better prediction accuracy than three other tested scoring functions for profile–profile comparison. We show that the COMPASS‐type scoring function can be derived both in the probabilistic framework and in the framework of statistical potentials. (2) Using the evolutionary frequencies of database fragments gives better prediction accuracy than using structural frequencies. (3) Finer definition of local environments, such as including more side‐chain solvent accessibility classes and considering the backbone conformations of neighboring residues, gives increasingly better prediction accuracy using structural frequencies. (4) Combining evolutionary and structural frequencies of database fragments, either in a linear fashion or using a pseudocount mixture formula, results in improvement of prediction accuracy. Combination at the log‐odds score level is not as effective as combination at the frequency level. This suggests that there might be better ways of combining sequence and structural information than the commonly used linear combination of log‐odds scores. Our method of fragment selection and frequency combination gives reasonable results of secondary structure prediction tested on 56 CASP5 targets (average SOV score 0.77), suggesting that it is a valid method for local protein structure prediction. Mixture of predicted structural frequencies and evolutionary frequencies improve the quality of local profile‐to‐profile alignment by COMPASS. Proteins 2004. © 2004 Wiley‐Liss, Inc.

[1]  M. Levitt,et al.  Small libraries of protein fragments model native protein structures accurately. , 2002, Journal of molecular biology.

[2]  J. Skolnick,et al.  Ab initio protein structure prediction via a combination of threading, lattice folding, clustering, and structure refinement , 2001, Proteins.

[3]  Jimin Pei,et al.  AL2CO: calculation of positional conservation in a protein sequence alignment , 2001, Bioinform..

[4]  Liisa Holm,et al.  Picasso: generating a covering set of protein family profiles , 2001, Bioinform..

[5]  J L Sussman,et al.  A 3D building blocks approach to analyzing and predicting structure of proteins , 1989, Proteins.

[6]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[7]  R. Aurora,et al.  Helix capping , 1998, Protein science : a publication of the Protein Society.

[8]  D. Fischer,et al.  Protein fold recognition using sequence‐derived predictions , 1996, Protein science : a publication of the Protein Society.

[9]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[10]  D. Baker,et al.  Prediction of local structure in proteins using a library of sequence-structure motifs. , 1998, Journal of molecular biology.

[11]  D Eisenberg,et al.  A 3D-1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence. , 1997, Journal of molecular biology.

[12]  D. Baker,et al.  Recurring local sequence motifs in proteins. , 1995, Journal of molecular biology.

[13]  Roland L. Dunbrack,et al.  Backbone-dependent rotamer library for proteins. Application to side-chain prediction. , 1993, Journal of molecular biology.

[14]  Y Shan,et al.  Fold recognition and accurate query‐template alignment by a combination of PSI‐BLAST and threading , 2001, Proteins.

[15]  A. Sali,et al.  Evolution and physics in comparative protein structure modeling. , 2002, Accounts of chemical research.

[16]  J. Thornton,et al.  A revised set of potentials for beta-turn formation in proteins. , 1994, Protein science : a publication of the Protein Society.

[17]  B K Shoichet,et al.  A relationship between protein stability and protein function. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[18]  C. Etchebest,et al.  Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks , 2000, Proteins.

[19]  A. D. McLachlan,et al.  Solvation energy in protein folding and binding , 1986, Nature.

[20]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[21]  Golan Yona,et al.  Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. , 2002, Journal of molecular biology.

[22]  T. Alwyn Jones,et al.  CASP3 comparative modeling evaluation , 1999, Proteins.

[23]  M. Sippl Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. , 1990, Journal of molecular biology.

[24]  B. Rost,et al.  A modified definition of Sov, a segment‐based measure for protein secondary structure prediction assessment , 1999, Proteins.

[25]  J. Ponder,et al.  Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. , 1987, Journal of molecular biology.

[26]  Richard Bonneau,et al.  Rosetta in CASP4: Progress in ab initio protein structure prediction , 2001, Proteins.

[27]  S L Mayo,et al.  Intrinsic beta-sheet propensities result from van der Waals interactions between side chains and the local backbone. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[28]  K Karplus,et al.  What is the value added by human intervention in protein structure prediction? , 2001, Proteins.

[29]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[30]  E. Lander,et al.  Protein secondary structure prediction using nearest-neighbor methods. , 1993, Journal of molecular biology.

[31]  A A Salamov,et al.  Protein secondary structure prediction using local alignments. , 1997, Journal of molecular biology.

[32]  V. Thorsson,et al.  HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. , 2000, Journal of molecular biology.

[33]  A Tramontano,et al.  Structural conservation in single-domain proteins: implications for homology modeling. , 2001, Journal of structural biology.

[34]  J. Thornton,et al.  A revised set of potentials for β‐turn formation in proteins , 1994 .

[35]  Chris Sander,et al.  Touring protein fold space with Dali/FSSP , 1998, Nucleic Acids Res..

[36]  J M Thornton,et al.  Molecular recognition. Conformational analysis of limited proteolytic sites and serine proteinase protein inhibitors. , 1991, Journal of molecular biology.

[37]  Leszek Rychlewski,et al.  Improving the quality of twilight‐zone alignments , 2000, Protein science : a publication of the Protein Society.

[38]  An-Suei Yang,et al.  Local Structure Prediction with Local Structure-based Sequence Profiles , 2003, Bioinform..

[39]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[40]  Ronald Breslow,et al.  Molecular recognition , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[41]  M J Sippl,et al.  Knowledge-based potentials for proteins. , 1995, Current opinion in structural biology.

[42]  Richard Bonneau,et al.  Ab initio protein structure prediction: progress and prospects. , 2001, Annual review of biophysics and biomolecular structure.

[43]  Rama Ranganathan,et al.  Knowledge-based potential functions in protein design. , 2002, Current opinion in structural biology.

[44]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[45]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[46]  Eugene I. Shakhnovich,et al.  A structure-based method for derivation of all-atom potentials for protein folding , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[47]  N. Grishin,et al.  COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. , 2003, Journal of molecular biology.

[48]  R. Srinivasan,et al.  A physical basis for protein secondary structure. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[49]  R. Jernigan,et al.  Self‐consistent estimation of inter‐residue protein contact energies based on an equilibrium mixture approximation of residues , 1999, Proteins.

[50]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[51]  M. Karplus,et al.  PDB-based protein loop prediction: parameters for selection and methods for optimization. , 1997, Journal of molecular biology.

[52]  S. Sunyaev,et al.  PSIC: profile extraction from sequence alignments with position-specific counts of independent observations. , 1999, Protein engineering.

[53]  John P. Overington,et al.  Fragment ranking in modelling of protein structure. Conformationally constrained environmental amino acid substitution tables. , 1993, Journal of molecular biology.

[54]  T L Blundell,et al.  FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. , 2001, Journal of molecular biology.

[55]  An-Suei Yang,et al.  Local structure-based sequence profile database for local and global protein structure predictions , 2002, Bioinform..

[56]  D. Baker,et al.  An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes. , 2003, Journal of molecular biology.

[57]  Shankar Subramaniam,et al.  Protein fragment clustering and canonical local shapes , 2003, Proteins.

[58]  Pierre Baldi,et al.  Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles , 2002, Proteins.

[59]  R. Samudrala,et al.  An all-atom distance-dependent conditional probability discriminatory function for protein structure prediction. , 1998, Journal of molecular biology.

[60]  D. Shortle Composites of local structure propensities: evidence for local encoding of long-range structure. , 2002, Protein science : a publication of the Protein Society.

[61]  D Fischer,et al.  Hybrid fold recognition: combining sequence derived properties with evolutionary information. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[62]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[63]  S. Pietrokovski Searching databases of conserved sequence regions by aligning protein multiple-alignments. , 1996, Nucleic acids research.

[64]  S. Altschul,et al.  Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[65]  M J Rooman,et al.  Automatic definition of recurrent local structure motifs in proteins. , 1990, Journal of molecular biology.

[66]  Nick V. Grishin,et al.  Probabilistic scoring measures for profile-profile comparison yield more accurate short seed alignments , 2003, Bioinform..

[67]  P. Y. Chou,et al.  Prediction of the secondary structure of proteins from their amino acid sequence. , 2006 .