Graph-Based Community Detection for Decoy Selection in Template-Free Protein Structure Prediction

Significant efforts in wet and dry laboratories are devoted to resolving molecular structures. In particular, computational methods can now compute thousands of tertiary structures that populate the structure space of a protein molecule of interest. These advances are now allowing us to turn our attention to analysis methodologies that are able to organize the computed structures in order to highlight functionally relevant structural states. In this paper, we propose a methodology that leverages community detection methods, designed originally to detect communities in social networks, to organize computationally probed protein structure spaces. We report a principled comparison of such methods along several metrics on proteins of diverse folds and lengths. We present a rigorous evaluation in the context of decoy selection in template-free protein structure prediction. The results make the case that network-based community detection methods warrant further investigation to advance analysis of protein structure spaces for automated selection of functionally relevant structures.

[1]  Haruki Nakamura,et al.  Announcing the worldwide Protein Data Bank , 2003, Nature Structural Biology.

[2]  R. Nussinov,et al.  The role of dynamic conformational ensembles in biomolecular recognition. , 2009, Nature chemical biology.

[3]  Nasrin Akhter,et al.  From Extraction of Local Structures of Protein Energy Landscapes to Improved Decoy Selection in Template-Free Protein Structure Prediction , 2018, Molecules.

[4]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Ruqian Lu,et al.  Sorting protein decoys by machine-learning-to-rank , 2016, Scientific Reports.

[6]  Amarda Shehu,et al.  Multi-Objective Stochastic Search for Sampling Local Minima in the Protein Energy Surface , 2013, BCB.

[7]  R. Fisher On the Interpretation of χ 2 from Contingency Tables , and the Calculation of P Author , 2022 .

[8]  Matthieu Latapy,et al.  Computing Communities in Large Networks Using Random Walks , 2004, J. Graph Algorithms Appl..

[9]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[10]  Dong Xu,et al.  Protein Structural Model Selection by Combining Consensus and Single Scoring Methods , 2013, PloS one.

[11]  Jens Meiler,et al.  ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. , 2011, Methods in enzymology.

[12]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  D. Boehr,et al.  How Do Proteins Interact? , 2008, Science.

[14]  G. Barnard,et al.  A New Test for 2 × 2 Tables , 1945, Nature.

[15]  Jure Leskovec,et al.  Defining and Evaluating Network Communities Based on Ground-Truth , 2012, ICDM.

[16]  Andrzej Kloczkowski,et al.  MQAPsingle: A quasi single‐model approach for estimation of the quality of individual protein structure models , 2016, Proteins.

[17]  Anthony K. Felts,et al.  Distinguishing native conformations of proteins from decoys with an effective free energy estimator based on the OPLS all‐atom force field and the surface generalized born solvent model , 2002, Proteins.

[18]  G. Barnard,et al.  A New Test for 2 × 2 Tables , 1945 .

[19]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Yang Zhang,et al.  SPICKER: A clustering approach to identify near‐native protein folds , 2004, J. Comput. Chem..

[21]  Amarda Shehu,et al.  A multiscale hybrid evolutionary algorithm to obtain sample-based representations of multi-basin protein energy landscapes , 2014, BCB.

[22]  Anna Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP) — round x , 2014, Proteins.

[23]  Dong Xu,et al.  DL-PRO: A novel deep learning method for protein model quality assessment , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[24]  Nasrin Akhter,et al.  Community Detection for Decoy Selection in Template-free Protein Structure Prediction , 2018, BCB.

[25]  Yang Zhang,et al.  Identification of near‐native structures by clustering protein docking conformations , 2007, Proteins.

[26]  Zheng Wang,et al.  Benchmarking Deep Networks for Predicting Residue-Specific Quality of Individual Protein Models in CASP11 , 2016, Scientific Reports.

[27]  Ruth Nussinov,et al.  Principles and Overview of Sampling Methods for Modeling Macromolecular Structure and Dynamics , 2016, PLoS Comput. Biol..

[28]  Chen Keasar,et al.  Purely Structural Protein Scoring Functions Using Support Vector Machine and Ensemble Learning , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[29]  Carl T. Bergstrom,et al.  The map equation , 2009, 0906.1405.

[30]  A. D. McLachlan,et al.  A mathematical procedure for superimposing atomic coordinates of proteins , 1972 .

[31]  Renzhi Cao,et al.  SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines , 2013, BMC Bioinformatics.

[32]  Anna Tramontano,et al.  Evaluation of model quality predictions in CASP9 , 2011, Proteins.

[33]  Balachandran Manavalan,et al.  Random Forest-Based Protein Model Quality Assessment (RFMQA) Using Structural Features and Potential Energy Terms , 2014, PloS one.

[34]  Yang Zhang,et al.  Ab initio protein structure assembly using continuous structure fragments and optimized knowledge‐based force field , 2012, Proteins.

[35]  Karolis Uziela,et al.  ProQ2: estimation of model accuracy implemented in Rosetta , 2016, Bioinform..

[36]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[37]  Anna Tramontano,et al.  Assessment of the assessment: Evaluation of the model quality estimates in CASP10 , 2014, Proteins.

[38]  A. Ben-Naim STATISTICAL POTENTIALS EXTRACTED FROM PROTEIN STRUCTURES : ARE THESE MEANINGFUL POTENTIALS? , 1997 .