Deep Ranking in Template-free Protein Structure Prediction

The road to the discovery of the biological activities of a protein molecule in the cell goes through knowledge of its three-dimensional, biologically-active structure(s). Current evidence suggests significant regions of the protein universe are inaccessible by either wet-laboratory structure determination or homology modeling. While great progress has been made by computational approaches in elucidating dark regions of the proteome, inherent challenges remain. In this paper, we advance research on addressing one such a challenge known as model (quality) assessment. In essence, the task involves discriminating relevant structure(s) among many computed for a protein of interest. We propose a method based on deep learning and evaluate it on tertiary structures computed by a popular de-novo platform on benchmark datasets. The method uses novel protein residue-residue distance features, improved residue-residue contacts, together with other features, such as energies and model topology similarity, to estimate the quality of protein models. A detailed evaluation shows that the proposed method outperforms related ones and advances the state of the art in model assessment.

[1]  Ruth Nussinov,et al.  Principles and Overview of Sampling Methods for Modeling Macromolecular Structure and Dynamics , 2016, PLoS Comput. Biol..

[2]  Sergei Grudinin,et al.  Smooth orientation-dependent scoring function for coarse-grained protein quality assessment , 2018, Bioinform..

[3]  Haruki Nakamura,et al.  Announcing the worldwide Protein Data Bank , 2003, Nature Structural Biology.

[4]  M. Karplus,et al.  Discrimination of the native from misfolded protein models with an energy function including implicit solvation. , 1999, Journal of molecular biology.

[5]  Nasrin Akhter,et al.  An Energy Landscape Treatment of Decoy Selection in Template-Free Protein Structure Prediction , 2018, Comput..

[6]  Nasrin Akhter,et al.  From Extraction of Local Structures of Protein Energy Landscapes to Improved Decoy Selection in Template-Free Protein Structure Prediction , 2018, Molecules.

[7]  Adam Zemla,et al.  LGA: a method for finding 3D similarities in protein structures , 2003, Nucleic Acids Res..

[8]  B. McConkey,et al.  Discrimination of native protein structures using atom–atom contact scoring , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[9]  A. Elofsson,et al.  Can correct protein models be identified? , 2003, Protein science : a publication of the Protein Society.

[10]  Amarda Shehu,et al.  Probabilistic Search and Energy Guidance for Biased Decoy Sampling in Ab Initio Protein Structure Prediction , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  Balachandran Manavalan,et al.  Random Forest-Based Protein Model Quality Assessment (RFMQA) Using Structural Features and Potential Energy Terms , 2014, PloS one.

[12]  Dong Xu,et al.  DL-PRO: A novel deep learning method for protein model quality assessment , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[13]  Jianlin Cheng,et al.  DeepDist: real-value inter-residue distance prediction with deep residual convolutional network , 2020, bioRxiv.

[14]  B. Rost,et al.  Unexpected features of the dark proteome , 2015, Proceedings of the National Academy of Sciences.

[15]  J. Skolnick,et al.  GOAP: a generalized orientation-dependent, all-atom statistical potential for protein structure prediction. , 2011, Biophysical journal.

[16]  Demis Hassabis,et al.  Improved protein structure prediction using potentials from deep learning , 2020, Nature.

[17]  Björn Wallner,et al.  Improved model quality assessment using ProQ2 , 2012, BMC Bioinformatics.

[18]  Amarda Shehu,et al.  Balancing multiple objectives in conformation sampling to control decoy diversity in template-free protein structure prediction , 2019, BMC Bioinformatics.

[19]  V. de Crécy-Lagard,et al.  Mining high-throughput experimental data to link gene and function. , 2011, Trends in biotechnology.

[20]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[21]  Kenneth A. De Jong,et al.  Off-lattice protein structure prediction with homologous crossover , 2013, GECCO '13.

[22]  Yang Xu,et al.  Protein structural model selection based on protein-dependent scoring function , 2012 .

[23]  Takashi Ishida,et al.  Protein model accuracy estimation based on local structure quality assessment using 3D convolutional neural network , 2019, PloS one.

[24]  Dong Xu,et al.  Protein Structural Model Selection by Combining Consensus and Single Scoring Methods , 2013, PloS one.

[25]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[27]  Yang Zhang,et al.  Ab initio protein structure assembly using continuous structure fragments and optimized knowledge‐based force field , 2012, Proteins.

[28]  Jie Hou,et al.  DNCON2: improved protein contact prediction using two-level deep convolutional neural networks , 2017, bioRxiv.

[29]  Rhiju Das,et al.  Four Small Puzzles That Rosetta Doesn't Solve , 2011, PloS one.

[30]  Jens Meiler,et al.  ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. , 2011, Methods in enzymology.

[31]  Anna Tramontano,et al.  Assessment of the assessment: Evaluation of the model quality estimates in CASP10 , 2014, Proteins.

[32]  S Chatterjee,et al.  Network properties of decoys and CASP predicted models: a comparison with native protein structures. , 2013, Molecular bioSystems.

[33]  Guillaume Pagès,et al.  Protein model quality assessment using 3D oriented convolutional neural networks , 2018 .

[34]  D. Boehr,et al.  How Do Proteins Interact? , 2008, Science.

[35]  Renzhi Cao,et al.  Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13 , 2019, bioRxiv.

[36]  Karolis Uziela,et al.  ProQ2: estimation of model accuracy implemented in Rosetta , 2016, Bioinform..

[37]  David Baker,et al.  Ranking predicted protein structures with support vector regression , 2007, Proteins.

[38]  J. Hermans,et al.  Free energies of protein decoys provide insight into determinants of protein stability , 2001, Protein science : a publication of the Protein Society.

[39]  Arne Elofsson,et al.  Estimation of model accuracy in CASP13 , 2019, Proteins.

[40]  Jianlin Cheng,et al.  Evaluating the absolute quality of a single protein model using structural features and support vector machines , 2009, Proteins.

[41]  Ruqian Lu,et al.  Sorting protein decoys by machine-learning-to-rank , 2016, Scientific Reports.

[42]  Amarda Shehu,et al.  Multi-Objective Stochastic Search for Sampling Local Minima in the Protein Energy Surface , 2013, BCB.

[43]  Sergei Grudinin,et al.  Protein model quality assessment using 3D oriented convolutional neural networks , 2018, bioRxiv.

[44]  Jooyoung Lee,et al.  SVMQA: support‐vector‐machine‐based protein single‐model quality assessment , 2017, Bioinform..

[45]  Z. Luthey-Schulten,et al.  Ab initio protein structure prediction. , 2002, Current opinion in structural biology.

[46]  Andrzej Kloczkowski,et al.  MQAPsingle: A quasi single‐model approach for estimation of the quality of individual protein structure models , 2016, Proteins.

[47]  Torsten Schwede,et al.  Assessment of model accuracy estimations in CASP12 , 2018, Proteins.

[48]  J. Skolnick,et al.  A distance‐dependent atomic knowledge‐based potential for improved protein structure selection , 2001, Proteins.

[49]  Charles L. Brooks,et al.  Identifying native‐like protein structures using physics‐based potentials , 2002, J. Comput. Chem..

[50]  Brian S. Olson,et al.  Multi-Objective Optimization Techniques for Conformational Sampling in Template-Free Protein Structure Prediction , 2014 .

[51]  David T. Jones,et al.  Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints , 2018, Nature Communications.

[52]  Arne Elofsson,et al.  Methods for estimation of model accuracy in CASP12 , 2017, bioRxiv.

[53]  Chen Keasar,et al.  Purely Structural Protein Scoring Functions Using Support Vector Machine and Ensemble Learning , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[54]  Renzhi Cao,et al.  Deep convolutional neural networks for predicting the quality of single protein structural models , 2019, bioRxiv.

[55]  Jie Hou,et al.  DeepQA: improving the estimation of single protein model quality with deep belief networks , 2016, BMC Bioinformatics.