Fragment‐based local statistical potentials derived by combining an alphabet of protein local structures with secondary structures and solvent accessibilities

General and transferable statistical potentials to quantify the compatibility between local structures and local sequences of peptide fragments in proteins were derived. In the derivation, structure clusters of fragments are obtained by clustering five‐residue fragments in native proteins based on their conformations represented by a local structure alphabet (de Brevern et al., Proteins 2000;41:271–287), secondary structure states, and solvent accessibilities. On the basis of the native sequences of the structurally clustered fragments, the probabilities of different amino acid sequences were estimated for each structure cluster. From the sequence probabilities, statistical energies as a function of sequence for a given structure were directly derived. The same sequence probabilities were employed in a database‐matching approach to derive statistical energies as a function of local structure for a given sequence. Compared with prior models of local statistical potentials, we provided an integrated approach in which local conformations and local environments are treated jointly, structures are treated in units of fragments instead of individual residues so that coupling between the conformations of adjacent residues is included, and strong interdependences between the conformations of overlapping or neighboring fragment units are also considered. In tests including fragment threading, pseudosequence design, and local structure predictions, the potentials performed at least comparably and, in most cases, better than a number of existing models applicable to the same contexts indicating the advantages of such an integrated approach for deriving local potentials and suggesting applicability of the statistical potentials derived here in sequence designs and structure predictions. Proteins 2009. © 2008 Wiley‐Liss, Inc.

[1]  T L Blundell,et al.  Use of amino acid environment-dependent substitution tables and conformational propensities in structure prediction from aligned sequences of homologous proteins. I. Solvent accessibility classes. , 1994, Journal of molecular biology.

[2]  M. Tyagi,et al.  Local Protein Structures , 2007 .

[3]  Hongyi Zhou,et al.  Fold recognition by combining sequence profiles derived from evolution and from depth‐dependent structural alignment of fragments , 2004, Proteins.

[4]  C Bystroff,et al.  Blind predictions of local protein structure in CASP2 targets using the I‐sites library , 1997, Proteins.

[5]  D. Eisenberg,et al.  Assessment of protein models with three-dimensional profiles , 1992, Nature.

[6]  A C Camproux,et al.  Hidden Markov model-derived structural alphabet for proteins: the learning of protein local shapes captures sequence specificity. , 2005, Biochimica et biophysica acta.

[7]  An-Suei Yang,et al.  Local structure-based sequence profile database for local and global protein structure predictions , 2002, Bioinform..

[8]  D. Baker,et al.  Native protein sequences are close to optimal for their structures. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[9]  An-Suei Yang,et al.  Local Structure Prediction with Local Structure-based Sequence Profiles , 2003, Bioinform..

[10]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[11]  M Vendruscolo,et al.  Can a pairwise contact potential stabilize native protein folds against decoys obtained by threading? , 2000, Proteins.

[12]  C. Etchebest,et al.  A structural alphabet for local protein structures: Improved prediction methods , 2005, Proteins.

[13]  Cristina Benros,et al.  Assessing a novel approach for predicting local 3D protein structures from sequence , 2005, Proteins.

[14]  P. Argos,et al.  Knowledge‐based protein secondary structure assignment , 1995, Proteins.

[15]  Shankar Subramaniam,et al.  Protein fragment clustering and canonical local shapes , 2003, Proteins.

[16]  Pierre Tufféry,et al.  Exploring the use of a structural alphabet for structural prediction of protein loops , 2001 .

[17]  D. Shortle,et al.  Prediction of protein structure by emphasizing local side‐chain/backbone interactions in ensembles of turn fragments , 2003, Proteins.

[18]  S. Sunyaev,et al.  PSIC: profile extraction from sequence alignments with position-specific counts of independent observations. , 1999, Protein engineering.

[19]  D. Baker,et al.  Prediction of local structure in proteins using a library of sequence-structure motifs. , 1998, Journal of molecular biology.

[20]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[21]  Eugene I Shakhnovich,et al.  Lessons from the design of a novel atomic potential for protein folding , 2005, Protein science : a publication of the Protein Society.

[22]  Armando D Solis,et al.  Improvement of statistical potentials and threading score functions using information maximization , 2006, Proteins.

[23]  D. Shortle Composites of local structure propensities: evidence for local encoding of long-range structure. , 2002, Protein science : a publication of the Protein Society.

[24]  B. Rost,et al.  A modified definition of Sov, a segment‐based measure for protein secondary structure prediction assessment , 1999, Proteins.

[25]  Christopher Bystroff,et al.  Fully automated ab initio protein structure prediction using I-STES, HMMSTR and ROSETTA , 2002, ISMB.

[26]  Kevin Karplus,et al.  Evaluation of local structure alphabets based on residue burial , 2004, Proteins.

[27]  Nick V. Grishin,et al.  Estimates of statistical significance for comparison of individual positions in multiple sequence alignments , 2004, BMC bioinformatics.

[28]  Thomas Lengauer,et al.  BMC Bioinformatics Methodology article Local protein structure prediction using discriminative models , 2006 .

[29]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Deniz Yuret,et al.  Relationships between amino acid sequence and backbone torsion angle preferences , 2004, Proteins.

[31]  S. L. Mayo,et al.  De novo protein design: fully automated sequence selection. , 1997, Science.

[32]  J U Bowie,et al.  Three-dimensional profiles for analysing protein sequence-structure relationships. , 1992, Faraday discussions.

[33]  M. Palumbo,et al.  Patterns, structures, and amino acid frequencies in structural building blocks, a protein secondary structure classification scheme , 1997, Proteins.

[34]  Song Liu,et al.  Fold recognition by concurrent use of solvent accessibility and residue depth , 2007, Proteins.

[35]  H. Valadié,et al.  Extension of a local backbone description using a structural alphabet: A new approach to the sequence‐structure relationship , 2002, Protein science : a publication of the Protein Society.

[36]  Silvio C. E. Tosatto,et al.  TAP score: torsion angle propensity normalization applied to local protein structure evaluation , 2007, BMC Bioinformatics.

[37]  Jorja G. Henikoff,et al.  Using substitution probabilities to improve position-specific scoring matrices , 1996, Comput. Appl. Biosci..

[38]  Yi Liu,et al.  RosettaDesign server for protein design , 2006, Nucleic Acids Res..

[39]  G. Rose,et al.  Is protein folding hierarchic? I. Local structure and peptide folding. , 1999, Trends in biochemical sciences.

[40]  V. Thorsson,et al.  HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. , 2000, Journal of molecular biology.

[41]  A C Camproux,et al.  A hidden markov model derived structural alphabet for proteins. , 2004, Journal of molecular biology.

[42]  N. Pokala,et al.  Energy functions for protein design: adjustment with protein-protein complex affinities, models for the unfolded state, and negative design of solubility and specificity. , 2005, Journal of molecular biology.

[43]  T L Blundell,et al.  Prediction of the stability of protein mutants based on structural environment-dependent amino acid substitution and propensity tables. , 1997, Protein engineering.

[44]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[45]  D. Baker,et al.  Design of a Novel Globular Protein Fold with Atomic-Level Accuracy , 2003, Science.

[46]  S. L. Mayo,et al.  Automated design of the surface positions of protein helices , 1997, Protein science : a publication of the Protein Society.

[47]  Serge A. Hazout,et al.  Local backbone structure prediction of proteins , 2004, Silico Biol..

[48]  B. Lee,et al.  The interpretation of protein structures: estimation of static accessibility. , 1971, Journal of molecular biology.

[49]  D. Baker,et al.  A large scale test of computational protein design: folding and stability of nine completely redesigned globular proteins. , 2003, Journal of molecular biology.

[50]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[51]  M J Sippl,et al.  Knowledge-based potentials for proteins. , 1995, Current opinion in structural biology.

[52]  B. Erman,et al.  Computational basis of knowledge‐based conformational probabilities derived from local‐ and long‐range interactions in proteins , 2006, Proteins.

[53]  Shankar Subramaniam,et al.  Protein local structure prediction from sequence , 2003, Proteins.

[54]  Tamotsu Noguchi,et al.  PDB-REPRDB: a database of representative protein chains from the Protein Data Bank (PDB) in 2003 , 2003, Nucleic Acids Res..

[55]  C. Etchebest,et al.  Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks , 2000, Proteins.

[56]  J. Skolnick,et al.  Local propensities and statistical potentials of backbone dihedral angles in proteins. , 2004, Journal of molecular biology.

[57]  A. Sali,et al.  Alignment of protein sequences by their profiles , 2004, Protein science : a publication of the Protein Society.

[58]  T L Blundell,et al.  Use of amino acid environment-dependent substitution tables and conformational propensities in structure prediction from aligned sequences of homologous proteins. II. Secondary structures. , 1994, Journal of molecular biology.

[59]  T L Blundell,et al.  FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. , 2001, Journal of molecular biology.

[60]  Rama Ranganathan,et al.  Knowledge-based potential functions in protein design. , 2002, Current opinion in structural biology.

[61]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.