Novel knowledge-based mean force potential at the profile level

BackgroundThe development and testing of functions for the modeling of protein energetics is an important part of current research aimed at understanding protein structure and function. Knowledge-based mean force potentials are derived from statistical analyses of interacting groups in experimentally determined protein structures. Current knowledge-based mean force potentials are developed at the atom or amino acid level. The evolutionary information contained in the profiles is not investigated. Based on these observations, a class of novel knowledge-based mean force potentials at the profile level has been presented, which uses the evolutionary information of profiles for developing more powerful statistical potentials.ResultsThe frequency profiles are directly calculated from the multiple sequence alignments outputted by PSI-BLAST and converted into binary profiles with a probability threshold. As a result, the protein sequences are represented as sequences of binary profiles rather than sequences of amino acids. Similar to the knowledge-based potentials at the residue level, a class of novel potentials at the profile level is introduced. We develop four types of profile-level statistical potentials including distance-dependent, contact, Φ/Ψ dihedral angle and accessible surface statistical potentials. These potentials are first evaluated by the fold assessment between the correct and incorrect models generated by comparative modeling from our own and other groups. They are then used to recognize the native structures from well-constructed decoy sets. Experimental results show that all the knowledge-base mean force potentials at the profile level outperform those at the residue level. Significant improvements are obtained for the distance-dependent and accessible surface potentials (5–6%). The contact and Φ/Ψ dihedral angle potential only get a slight improvement (1–2%). Decoy set evaluation results show that the distance-dependent profile-level potentials even outperform other atom-level potentials. We also demonstrate that profile-level statistical potentials can improve the performance of threading.ConclusionThe knowledge-base mean force potentials at the profile level can provide better discriminatory ability than those at the residue level, so they will be useful for protein structure prediction and model refinement.

[1]  Qiaojun Fang,et al.  Enhanced sampling near the native conformation using statistical potentials for local side‐chain and backbone interactions , 2005, Proteins.

[2]  Hongyi Zhou,et al.  Single‐body residue‐level knowledge‐based energy score combined with sequence‐profile and secondary structure information for fold recognition , 2004, Proteins.

[3]  Arne Elofsson,et al.  Profile–profile methods provide improved fold‐recognition: A study of different profile–profile alignment methods , 2004, Proteins.

[4]  S. Altschul,et al.  Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[5]  J. Skolnick,et al.  A distance‐dependent atomic knowledge‐based potential for improved protein structure selection , 2001, Proteins.

[6]  R Nussinov,et al.  Fast protein fold recognition via sequence to structure alignment and contact capacity potentials. , 1996, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[7]  Ram Samudrala,et al.  A Combined Approach for Ab Initio Construction of Low Resolution Protein Tertiary Structures from Sequence , 1999, Pacific Symposium on Biocomputing.

[8]  Richard Bonneau,et al.  An improved protein decoy set for testing energy functions for protein structure prediction , 2003, Proteins.

[9]  James A. Casbon,et al.  On single and multiple models of protein families for the detection of remote sequence relationships , 2006, BMC Bioinformatics.

[10]  Qiaojun Fang,et al.  A consistent set of statistical potentials for quantifying local side‐chain and backbone interactions , 2005, Proteins.

[11]  D Gilis,et al.  Identification and ab initio simulations of early folding units in proteins , 2001, Proteins.

[12]  Jooyoung Lee,et al.  PPRODO: Prediction of protein domain boundaries using neural networks , 2005, Proteins.

[13]  R. Samudrala,et al.  An all-atom distance-dependent conditional probability discriminatory function for protein structure prediction. , 1998, Journal of molecular biology.

[14]  Christopher M. Summa,et al.  An atomic environment potential for use in protein structure prediction. , 2005, Journal of molecular biology.

[15]  J Moult,et al.  Protein folding simulations with genetic algorithms and a detailed molecular description. , 1997, Journal of molecular biology.

[16]  Gabriel del Rio,et al.  Improved prediction of critical residues for protein function based on network and phylogenetic analyses , 2005, BMC Bioinformatics.

[17]  A. Sali,et al.  Statistical potentials for fold assessment , 2009 .

[18]  P. Kollman,et al.  Pathways to a protein folding intermediate observed in a 1-microsecond simulation in aqueous solution. , 1998, Science.

[19]  M. Levitt,et al.  A novel approach to decoy set generation: designing a physical energy function having local minima with native structure characteristics. , 2003, Journal of molecular biology.

[20]  C. Sander,et al.  Evaluation of protein models by atomic solvation preference. , 1992, Journal of molecular biology.

[21]  Richard Bonneau,et al.  Ab initio protein structure prediction of CASP III targets using ROSETTA , 1999, Proteins.

[22]  M. Karplus,et al.  Effective energy functions for protein structure prediction. , 2000, Current opinion in structural biology.

[23]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[24]  B. Rost,et al.  Critical assessment of methods of protein structure prediction (CASP)—Round 6 , 2005, Proteins.

[25]  M. Levitt,et al.  Improved protein structure selection using decoy-dependent discriminatory functions , 2004, BMC Structural Biology.

[26]  Tim J. P. Hubbard,et al.  SCOP database in 2004: refinements integrate structure and sequence family data , 2004, Nucleic Acids Res..

[27]  R A Goldstein,et al.  How to generate improved potentials for protein tertiary structure prediction: A lattice model study , 2000, Proteins.

[28]  R. Abagyan,et al.  Optimal docking area: A new method for predicting protein–protein interaction sites , 2004, Proteins.

[29]  Kuang Lin,et al.  A simple and fast secondary structure prediction method using hidden neural networks , 2005, Bioinform..

[30]  M. Levitt,et al.  Energy functions that discriminate X-ray and near native folds from well-constructed decoys. , 1996, Journal of molecular biology.

[31]  John P. Overington,et al.  HOMSTRAD: A database of protein structure alignments for homologous families , 1998, Protein science : a publication of the Protein Society.

[32]  Christos A. Ouzounis,et al.  Clustering the annotation space of proteins , 2005, BMC Bioinformatics.

[33]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[34]  Jiang Wang,et al.  Prediction of protein structural class with Rough Sets , 2006, BMC Bioinformatics.

[35]  Federico Fogolari,et al.  Amino acid empirical contact energy definitions for fold recognition in the space of contact maps , 2003, BMC Bioinformatics.

[36]  L A Mirny,et al.  How to derive a protein folding potential? A new approach to an old problem. , 1996, Journal of molecular biology.

[37]  F. Melo,et al.  Novel knowledge-based mean force potential at atomic level. , 1997, Journal of molecular biology.

[38]  Nick V. Grishin,et al.  Probabilistic scoring measures for profile-profile comparison yield more accurate short seed alignments , 2003, Bioinform..

[39]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[40]  Alexander D. MacKerell,et al.  All-atom empirical potential for molecular modeling and dynamics studies of proteins. , 1998, The journal of physical chemistry. B.

[41]  J L Klepeis,et al.  A new pairwise folding potential based on improved decoy generation and side‐chain packing , 2004, Proteins.

[42]  S. Henikoff,et al.  Position-based sequence weights. , 1994, Journal of molecular biology.

[43]  Peter M. Kasson,et al.  A hybrid machine-learning approach for segmentation of protein localization data , 2005, Bioinform..

[44]  Silvio C. E. Tosatto,et al.  A decoy set for the thermostable subdomain from chicken villin headpiece, comparison of different free energy estimators , 2005, BMC Bioinformatics.

[45]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[46]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[47]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[48]  Jian Qiu,et al.  Atomically detailed potentials to recognize native and approximate protein structures , 2005, Proteins.

[49]  Manfred J. Sippl,et al.  Boltzmann's principle, knowledge-based mean fields and protein folding. An approach to the computational determination of protein structures , 1993, J. Comput. Aided Mol. Des..

[50]  D. Eisenberg,et al.  VERIFY3D: assessment of protein models with three-dimensional profiles. , 1997, Methods in enzymology.

[51]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[52]  Chris Sander,et al.  Removing near-neighbour redundancy from large protein sequence collections , 1998, Bioinform..

[53]  M J Sippl,et al.  Knowledge-based potentials for proteins. , 1995, Current opinion in structural biology.

[54]  R. Jernigan,et al.  An empirical energy potential with a reference state for protein fold and sequence recognition , 1999, Proteins.

[55]  David Haussler,et al.  Using Dirichlet Mixture Priors to Derive Hidden Markov Models for Protein Families , 1993, ISMB.

[56]  M. Gruebele,et al.  Heterogeneous folding of the trpzip hairpin: full atom simulation and experiment. , 2004, Journal of molecular biology.

[57]  F. Melo,et al.  Assessing protein structures with a non-local atomic interaction energy. , 1998, Journal of molecular biology.

[58]  Scot E. Dowd,et al.  Windows .NET Network Distributed Basic Local Alignment Search Toolkit (W.ND-BLAST) , 2005, BMC Bioinformatics.

[59]  Hongyi Zhou,et al.  An accurate, residue‐level, pair potential of mean force for folding and binding based on the distance‐scaled, ideal‐gas reference state , 2004, Protein science : a publication of the Protein Society.

[60]  Markus Wiederstein,et al.  Protein sequence randomization: efficient estimation of protein stability using knowledge-based potentials. , 2005, Journal of molecular biology.

[61]  T. D. Schneider,et al.  Information content of binding sites on nucleotide sequences. , 1986, Journal of molecular biology.

[62]  Thomas Lengauer,et al.  BMC Bioinformatics Methodology article Local protein structure prediction using discriminative models , 2006 .

[63]  Shoji Takada,et al.  Optimizing physical energy functions for protein folding , 2003, Proteins.

[64]  Patrice Koehl,et al.  The ASTRAL Compendium in 2004 , 2003, Nucleic Acids Res..

[65]  Yang Dai,et al.  An SVM-based system for predicting protein subnuclear localizations , 2005, BMC Bioinformatics.

[66]  Jenn-Huei Lii,et al.  Directional hydrogen bonding in the MM3 force field: II , 1998, J. Comput. Chem..

[67]  M. Sippl Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. , 1990, Journal of molecular biology.

[68]  Narayanaswamy Srinivasan,et al.  Use of multiple profiles corresponding to a sequence alignment enables effective detection of remote homologues , 2005, Bioinform..

[69]  Hongyi Zhou,et al.  Fold recognition by combining sequence profiles derived from evolution and from depth‐dependent structural alignment of fragments , 2004, Proteins.