Ranking the quality of protein structure models using sidechain based network properties

Determining the correct structure of a protein given its sequence still remains an arduous task with many researchers working towards this goal. Most structure prediction methodologies result in the generation of a large number of probable candidates with the final challenge being to select the best amongst these. In this work, we have used Protein Structure Networks of native and modeled proteins in combination with Support Vector Machines to estimate the quality of a protein structure model and finally to provide ranks for these models. Model ranking is performed using regression analysis and helps in model selection from a group of many similar and good quality structures. Our results show that structures with a rank greater than 16 exhibit native protein-like properties while those below 10 are non-native like. The tool is also made available as a web-server ( http://vishgraph.mbu.iisc.ernet.in/GraProStr/native_non_native_ranking.html), where, 5 modelled structures can be evaluated at a given time.

[1]  S. Vishveshwara,et al.  Identification of side-chain clusters in protein structures by a graph spectral method. , 1999, Journal of molecular biology.

[2]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[3]  S. Vishveshwara,et al.  A network representation of protein structures: implications for protein stability. , 2005, Biophysical journal.

[4]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[5]  M. Levitt,et al.  Using a hydrophobic contact potential to evaluate native and near-native folds generated by molecular dynamics simulations. , 1996, Journal of molecular biology.

[6]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[7]  T. Takagi,et al.  Prediction of protein-protein interaction sites using support vector machines. , 2004, Protein engineering, design & selection : PEDS.

[8]  Kuo-Chen Chou,et al.  Prediction of Protein Structural Classes by Support Vector Machines , 2002, Comput. Chem..

[9]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[10]  G. Rose,et al.  A backbone-based theory of protein folding , 2006, Proceedings of the National Academy of Sciences.

[11]  Yang Zhang,et al.  A Novel Side-Chain Orientation Dependent Potential Derived from Random-Walk Reference State for Protein Fold Selection and Structure Prediction , 2010, PloS one.

[12]  David R. Westhead,et al.  Improved prediction of protein-protein binding sites using a support vector machines approach. , 2005, Bioinformatics.

[13]  D Gilis,et al.  Protein Decoy Sets for Evaluating Energy Functions , 2004, Journal of biomolecular structure & dynamics.

[14]  N. Kannan,et al.  Aromatic clusters: a determinant of thermal stability of thermophilic proteins. , 2000, Protein engineering.

[15]  A. Fersht,et al.  Contribution of hydrophobic interactions to protein stability , 1988, Nature.

[16]  M. Levitt,et al.  Energy functions that discriminate X-ray and near native folds from well-constructed decoys. , 1996, Journal of molecular biology.

[17]  Saraswathi Vishveshwara,et al.  PROTEIN STRUCTURE: INSIGHTS FROM GRAPH THEORY , 2002 .

[18]  Gideon Schreiber,et al.  Understanding hydrogen-bond patterns in proteins using network motifs , 2009, Bioinform..

[19]  K. Dill,et al.  The protein folding problem. , 1993, Annual review of biophysics.

[20]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[21]  Minoru Kanehisa,et al.  Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs , 2003, Bioinform..

[22]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[23]  H. Scheraga,et al.  The role of hydrophobic interactions in initiation and propagation of protein folding , 2006, Proceedings of the National Academy of Sciences.

[24]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[25]  A. Vázquez,et al.  Network clustering coefficient without degree-correlation biases. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  Saraswathi Vishveshwara,et al.  Insights into Protein–DNA Interactions through Structure Network Analysis , 2008, PLoS Comput. Biol..

[27]  S. Vishveshwara,et al.  Probing the allosteric mechanism in pyrrolysyl-tRNA synthetase using energy-weighted network formalism. , 2011, Biochemistry.

[28]  N. Kannan,et al.  Analysis of homodimeric protein interfaces by graph-spectral methods. , 2002, Protein engineering.

[29]  Richard Bonneau,et al.  An improved protein decoy set for testing energy functions for protein structure prediction , 2003, Proteins.

[30]  Saraswathi Vishveshwara,et al.  Network properties of protein-decoy structures , 2012, Journal of biomolecular structure & dynamics.

[31]  William Stafford Noble,et al.  Support vector machine , 2013 .

[32]  Alan R. Fersht,et al.  From the first protein structures to our current knowledge of protein folding: delights and scepticisms , 2008, Nature Reviews Molecular Cell Biology.

[33]  R Samudrala,et al.  Decoys ‘R’ Us: A database of incorrect conformations to improve protein structure prediction , 2000, Protein science : a publication of the Protein Society.

[34]  Shuigeng Zhou,et al.  A machine learning-based method for protein global model quality assessment , 2011, Int. J. Gen. Syst..

[35]  Saraswathi Vishveshwara,et al.  Understanding protein structure from a percolation perspective. , 2009, Biophysical journal.

[36]  Zhirong Sun,et al.  Support vector machine approach for protein subcellular localization prediction , 2001, Bioinform..

[37]  Valerie Daggett,et al.  The present view of the mechanism of protein folding , 2003, Nature Reviews Molecular Cell Biology.

[38]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[39]  M. Karplus Behind the folding funnel diagram. , 2011, Nature chemical biology.

[40]  Illés J. Farkas,et al.  CFinder: locating cliques and overlapping modules in biological networks , 2006, Bioinform..

[41]  A. Atilgan,et al.  Small-world communication of residues and significance for protein dynamics. , 2003, Biophysical journal.

[42]  G A Petsko,et al.  Aromatic-aromatic interaction: a mechanism of protein structure stabilization. , 1985, Science.

[43]  S Chatterjee,et al.  Network properties of decoys and CASP predicted models: a comparison with native protein structures. , 2013, Molecular bioSystems.

[44]  K. Fidelis,et al.  Protein structure prediction and model quality assessment. , 2009, Drug discovery today.

[45]  J. Skolnick,et al.  Ab initio modeling of small proteins by iterative TASSER simulations , 2007, BMC Biology.