Improving predicted protein loop structure ranking using a Pareto-optimality consensus method

BackgroundAccurate protein loop structure models are important to understand functions of many proteins. Identifying the native or near-native models by distinguishing them from the misfolded ones is a critical step in protein loop structure prediction.ResultsWe have developed a Pareto Optimal Consensus (POC) method, which is a consensus model ranking approach to integrate multiple knowledge- or physics-based scoring functions. The procedure of identifying the models of best quality in a model set includes: 1) identifying the models at the Pareto optimal front with respect to a set of scoring functions, and 2) ranking them based on the fuzzy dominance relationship to the rest of the models. We apply the POC method to a large number of decoy sets for loops of 4- to 12-residue in length using a functional space composed of several carefully-selected scoring functions: Rosetta, DOPE, DDFIRE, OPLS-AA, and a triplet backbone dihedral potential developed in our lab. Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of ~20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets. Compared to the individual scoring function yielding best selection accuracy in the decoy sets, the POC method yields 23%, 37%, and 64% less false positives in distinguishing the native conformation, indentifying a near-native model (RMSD < 0.5A from the native) as top-ranked, and selecting at least one near-native model in the top-5-ranked models, respectively. Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops. Furthermore, the POC method outperforms the other popularly-used consensus strategies in model ranking, such as rank-by-number, rank-by-rank, rank-by-vote, and regression-based methods.ConclusionsBy integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.

[1]  Teresa Head-Gordon,et al.  Improved Energy Selection of Nativelike Protein Loops from Loop Decoys. , 2008, Journal of chemical theory and computation.

[2]  Willem Waegeman,et al.  Learning to rank: a ROC-based graph-theoretic approach , 2009, 4OR.

[3]  David Baker,et al.  Ranking predicted protein structures with support vector regression , 2007, Proteins.

[4]  R. Bruccoleri,et al.  Ab initio loop modeling and its application to homology modeling. , 2000, Methods in molecular biology.

[5]  C M Deane,et al.  Improved protein loop prediction from sequence alone. , 2001, Protein engineering.

[6]  L. Aravind,et al.  Identification of the prokaryotic ligand-gated ion channels and their implications for the mechanisms and origins of animal Cys-loop ion channels , 2004, Genome Biology.

[7]  Narayanan Eswar,et al.  Protein structure modeling with MODELLER. , 2008, Methods in molecular biology.

[8]  Baldomero Oliva,et al.  A supersecondary structure library and search algorithm for modeling loops in protein structures , 2006, Nucleic acids research.

[9]  R A Friesner,et al.  Prediction of loop geometries using a generalized born model of solvation effects , 1999, Proteins.

[10]  Song Liu,et al.  Accurate and efficient loop selections by the DFIRE‐based all‐atom statistical potential , 2004, Protein science : a publication of the Protein Society.

[11]  Cinque S. Soto,et al.  Evaluating conformational free energies: The colony energy and its application to the problem of loop prediction , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[12]  R. Friesner,et al.  Generalized Born Model Based on a Surface Integral Formulation , 1998 .

[13]  Oleg Y Dmitriev,et al.  The rigid connecting loop stabilizes hairpin folding of the two helices of the ATP synthase subunit c , 2007, Protein science : a publication of the Protein Society.

[14]  Barry Honig,et al.  Loop modeling: Sampling, filtering, and scoring , 2007, Proteins.

[15]  Katharina Morik,et al.  About the non-convex optimization problem induced by non-positive semidefinite kernel learning , 2008, Adv. Data Anal. Classif..

[16]  M. DePristo,et al.  Ab initio construction of polypeptide fragments: Accuracy of loop decoy discrimination by an all‐atom statistical potential and the AMBER force field with the Generalized Born solvation model , 2003, Proteins.

[17]  M. Mezei,et al.  Prediction of protein loop structures using a local move Monte Carlo approach and a grid-based force field. , 2008, Protein engineering, design & selection : PEDS.

[18]  A. Ben-Naim STATISTICAL POTENTIALS EXTRACTED FROM PROTEIN STRUCTURES : ARE THESE MEANINGFUL POTENTIALS? , 1997 .

[19]  I. Song,et al.  Working Set Selection Using Second Order Information for Training Svm, " Complexity-reduced Scheme for Feature Extraction with Linear Discriminant Analysis , 2022 .

[20]  S. Tosatto,et al.  Application of MM/PBSA colony free energy to loop decoy discrimination: Toward correlation between energy and root mean square deviation , 2005, Protein science : a publication of the Protein Society.

[21]  R. K. Ursem Multi-objective Optimization using Evolutionary Algorithms , 2009 .

[22]  K. Dill,et al.  Statistical potentials extracted from protein structures: how accurate are they? , 1996, Journal of molecular biology.

[23]  A C Martin,et al.  Modeling antibody hypervariable loops: a combined algorithm. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Richard A Friesner,et al.  Prediction of Protein Loop Conformations using the AGBNP Implicit Solvent Model and Torsion Angle Sampling. , 2008, Journal of chemical theory and computation.

[25]  Jinbo Xu,et al.  Improving consensus contact prediction via server correlation reduction , 2009, BMC Structural Biology.

[26]  A Godzik,et al.  Knowledge-based potentials for protein folding: what can we learn from known protein structures? , 1996, Structure.

[27]  Bostjan Kobe,et al.  Structural Proteomics: High-Throughput Methods , 2008 .

[28]  Shuichi Hirono,et al.  Comparison of Consensus Scoring Strategies for Evaluating Computational Models of Protein-Ligand Complexes , 2006, J. Chem. Inf. Model..

[29]  Lisa Yan,et al.  LOOPER: a molecular mechanics-based algorithm for protein loop prediction. , 2008, Protein engineering, design & selection : PEDS.

[30]  Richard Bonneau,et al.  Ab initio protein structure prediction of CASP III targets using ROSETTA , 1999, Proteins.

[31]  Shaomeng Wang,et al.  How Does Consensus Scoring Work for Virtual Library Screening? An Idealized Computer Experiment , 2001, J. Chem. Inf. Comput. Sci..

[32]  Yaohang Li,et al.  Extensive exploration of conformational space improves Rosetta results for short protein domains. , 2008, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[33]  Kai Zhu,et al.  Toward better refinement of comparative models: Predicting loops in inexact environments , 2008, Proteins.

[34]  W. L. Jorgensen,et al.  Development and Testing of the OPLS All-Atom Force Field on Conformational Energetics and Properties of Organic Liquids , 1996 .

[35]  Gloria Fuentes,et al.  Prediction of protein loop geometries in solution , 2007, Proteins.

[36]  D. Baker,et al.  Modeling structurally variable regions in homologous proteins with rosetta , 2004, Proteins.

[37]  David Baker,et al.  Voltage sensor conformations in the open and closed states in ROSETTA structural models of K(+) channels. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Charles L. Brooks,et al.  Prediction of protein loop conformations using multiscale modeling methods with physical energy scoring functions , 2008, J. Comput. Chem..

[39]  Cen Gao,et al.  Scoring function accuracy for membrane protein structure prediction , 2007, Proteins.

[40]  E. Coutsias,et al.  Sub-angstrom accuracy in protein loop reconstruction by robotics-inspired conformational sampling , 2009, Nature Methods.

[41]  B. Honig,et al.  A hierarchical approach to all‐atom protein loop prediction , 2004, Proteins.

[42]  Yaohang Li,et al.  Backbone statistical potential from local sequence-structure interactions in protein loops. , 2010, The journal of physical chemistry. B.

[43]  R. Clark,et al.  Consensus scoring for ligand/protein interactions. , 2002, Journal of molecular graphics & modelling.

[44]  S. Wodak,et al.  Factors influencing the ability of knowledge-based potentials to identify native sequence-structure matches. , 1994, Journal of molecular biology.

[45]  R. Friesner,et al.  Long loop prediction using the protein local optimization program , 2006, Proteins.

[46]  M. Koppen,et al.  A fuzzy scheme for the ranking of multivariate data and its application , 2004, IEEE Annual Meeting of the Fuzzy Information, 2004. Processing NAFIPS '04..

[47]  A. Sali,et al.  Modeling of loops in protein structures , 2000, Protein science : a publication of the Protein Society.