Community Detection for Decoy Selection in Template-free Protein Structure Prediction

Significant efforts are devoted to resolving biologically-active structures in wet and dry laboratories. In particular, due to hardware and algorithmic innovations, computational methods can now obtain thousands of structures that populate the structure space of a protein of interest. With such advances, attention turns to organizing computed structures to extract the underlying organization of the structure space in service of highlighting biologically-active structural states. In this paper we report on the promise of leveraging community detection methods, designed originally to detect communities in social networks, to organize protein structure spaces probed in silico. We report on a principled comparison of such methods along several metrics and on proteins of diverse folds and lengths. More importantly, we present a rigorous evaluation in the context of decoy selection in template-free protein structure prediction. The presented results make the case that network-based community detection methods warrant further investigation to advance analysis of protein structure spaces for automated selection of biologically-active structures.

[1]  Yang Zhang,et al.  Identification of near‐native structures by clustering protein docking conformations , 2007, Proteins.

[2]  Erion Plaku,et al.  Sample-based Models of Protein Structural Transitions , 2016, BCB.

[3]  Balachandran Manavalan,et al.  Random Forest-Based Protein Model Quality Assessment (RFMQA) Using Structural Features and Potential Energy Terms , 2014, PloS one.

[4]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[5]  Dong Xu,et al.  Protein Structural Model Selection by Combining Consensus and Single Scoring Methods , 2013, PloS one.

[6]  Haruki Nakamura,et al.  Announcing the worldwide Protein Data Bank , 2003, Nature Structural Biology.

[7]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[8]  Dong Xu,et al.  DL-PRO: A novel deep learning method for protein model quality assessment , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[9]  Amarda Shehu,et al.  Computing energy landscape maps and structural excursions of proteins , 2016, BMC Genomics.

[10]  Andrzej Kloczkowski,et al.  MQAPsingle: A quasi single‐model approach for estimation of the quality of individual protein structure models , 2016, Proteins.

[11]  Yang Zhang,et al.  Ab initio protein structure assembly using continuous structure fragments and optimized knowledge‐based force field , 2012, Proteins.

[12]  A. Ben-Naim STATISTICAL POTENTIALS EXTRACTED FROM PROTEIN STRUCTURES : ARE THESE MEANINGFUL POTENTIALS? , 1997 .

[13]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Erion Plaku,et al.  Sample-Based Models of Protein Energy Landscapes and Slow Structural Rearrangements , 2018, J. Comput. Biol..

[15]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Anna Tramontano,et al.  Assessment of the assessment: Evaluation of the model quality estimates in CASP10 , 2014, Proteins.

[18]  Ruth Nussinov,et al.  Mapping the Conformation Space of Wildtype and Mutant H-Ras with a Memetic, Cellular, and Multiscale Evolutionary Algorithm , 2015, PLoS Comput. Biol..

[19]  Karolis Uziela,et al.  ProQ2: estimation of model accuracy implemented in Rosetta , 2016, Bioinform..

[20]  Zheng Wang,et al.  Benchmarking Deep Networks for Predicting Residue-Specific Quality of Individual Protein Models in CASP11 , 2016, Scientific Reports.

[21]  Renzhi Cao,et al.  SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines , 2013, BMC Bioinformatics.

[22]  Ruqian Lu,et al.  Sorting protein decoys by machine-learning-to-rank , 2016, Scientific Reports.

[23]  Amarda Shehu,et al.  Multi-Objective Stochastic Search for Sampling Local Minima in the Protein Energy Surface , 2013, BCB.

[24]  Carl T. Bergstrom,et al.  The map equation , 2009, 0906.1405.

[25]  Ruth Nussinov,et al.  Principles and Overview of Sampling Methods for Modeling Macromolecular Structure and Dynamics , 2016, PLoS Comput. Biol..

[26]  Anthony K. Felts,et al.  Distinguishing native conformations of proteins from decoys with an effective free energy estimator based on the OPLS all‐atom force field and the surface generalized born solvent model , 2002, Proteins.

[27]  Jens Meiler,et al.  ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. , 2011, Methods in enzymology.

[28]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[29]  D. Boehr,et al.  How Do Proteins Interact? , 2008, Science.

[30]  Nasrin Akhter,et al.  From Extraction of Local Structures of Protein Energy Landscapes to Improved Decoy Selection in Template-Free Protein Structure Prediction , 2018, Molecules.

[31]  Yang Zhang,et al.  SPICKER: A clustering approach to identify near‐native protein folds , 2004, J. Comput. Chem..

[32]  Amarda Shehu,et al.  A multiscale hybrid evolutionary algorithm to obtain sample-based representations of multi-basin protein energy landscapes , 2014, BCB.

[33]  Anna Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP) — round x , 2014, Proteins.