Using neural networks and evolutionary information in decoy discrimination for protein tertiary structure prediction

BackgroundWe present a novel method of protein fold decoy discrimination using machine learning, more specifically using neural networks. Here, decoy discrimination is represented as a machine learning problem, where neural networks are used to learn the native-like features of protein structures using a set of positive and negative training examples. A set of native protein structures provides the positive training examples, while negative training examples are simulated decoy structures obtained by reversing the sequences of native structures. Various features are extracted from the training dataset of positive and negative examples and used as inputs to the neural networks.ResultsResults have shown that the best performing neural network is the one that uses input information comprising of PSI-BLAST [1] profiles of residue pairs, pairwise distance and the relative solvent accessibilities of the residues. This neural network is the best among all methods tested in discriminating the native structure from a set of decoys for all decoy datasets tested.ConclusionThis method is demonstrated to be viable, and furthermore evolutionary information is successfully used in the neural networks to improve decoy discrimination.

[1]  M. Levitt,et al.  Energy functions that discriminate X-ray and near native folds from well-constructed decoys. , 1996, Journal of molecular biology.

[2]  David T. Jones Successful ab initio prediction of the tertiary structure of NK‐lysin using multiple sequences and recognized supersecondary structural motifs , 1997, Proteins.

[3]  M. Levitt,et al.  A comprehensive analysis of 40 blind protein structure predictions , 2002, BMC Structural Biology.

[4]  Arne Elofsson,et al.  Pcons5: combining consensus, structural evaluation and fold recognition scores , 2005, Bioinform..

[5]  A. Sali,et al.  A composite score for predicting errors in protein structure models , 2006, Protein science : a publication of the Protein Society.

[6]  Arne Elofsson,et al.  MaxSub: an automated measure for the assessment of protein structure prediction quality , 2000, Bioinform..

[7]  M. Sippl Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. , 1990, Journal of molecular biology.

[8]  J. Skolnick,et al.  Averaging interaction energies over homologs improves protein fold recognition in gapless threading , 1999, Proteins.

[9]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[10]  R Samudrala,et al.  Decoys ‘R’ Us: A database of incorrect conformations to improve protein structure prediction , 2000, Protein science : a publication of the Protein Society.

[11]  Barry Honig,et al.  Local quality assessment in homology models using statistical potentials and support vector machines , 2007, Protein science : a publication of the Protein Society.

[12]  David C. Jones Predicting novel protein folds by using FRAGFOLD , 2001, Proteins.

[13]  Yang Zhang,et al.  SPICKER: A clustering approach to identify near‐native protein folds , 2004, J. Comput. Chem..

[14]  Adam Zemla,et al.  LGA: a method for finding 3D similarities in protein structures , 2003, Nucleic Acids Res..

[15]  Ruben Abagyan,et al.  Structural model of nicotinic acetylcholine receptor isotypes bound to acetylcholine and nicotine , 2002, BMC Structural Biology.

[16]  Liam J. McGuffin,et al.  Improving sequence-based fold recognition by using 3D model quality assessment , 2005, Bioinform..

[17]  B. Rost,et al.  Conservation and prediction of solvent accessibility in protein families , 1994, Proteins.

[18]  D T Jones,et al.  Benchmarking template selection and model quality assessment for high‐resolution comparative modeling , 2007, Proteins.

[19]  Manfred J. Sippl,et al.  Thirty years of environmental health research--and growing. , 1996, Nucleic Acids Res..

[20]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[21]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[22]  Adam Zemla,et al.  Critical assessment of methods of protein structure prediction (CASP)‐round V , 2005, Proteins.

[23]  D. Baker,et al.  Clustering of low-energy conformations near the native structures of small proteins. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[24]  R Samudrala,et al.  Ab initio construction of protein tertiary structures using a hierarchical approach. , 2000, Journal of molecular biology.

[25]  Liam J. McGuffin,et al.  Improvement of the GenTHREADER Method for Genomic Fold Recognition , 2003, Bioinform..

[26]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt): an expanding universe of protein information , 2005, Nucleic Acids Res..

[27]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[28]  Silvio C. E. Tosatto,et al.  The Victor/FRST Function for Model Quality Estimation , 2005, J. Comput. Biol..

[29]  M. Levitt,et al.  A novel approach to decoy set generation: designing a physical energy function having local minima with native structure characteristics. , 2003, Journal of molecular biology.

[30]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[31]  Xiaolong Wang,et al.  Novel knowledge-based mean force potential at the profile level , 2006, BMC Bioinformatics.

[32]  Richard Bonneau,et al.  An improved protein decoy set for testing energy functions for protein structure prediction , 2003, Proteins.

[33]  E S Huang,et al.  Factors affecting the ability of energy functions to discriminate correct from incorrect folds. , 1997, Journal of molecular biology.