论文信息 - Improving predicted protein loop structure ranking using a Pareto-optimality consensus method

Improving predicted protein loop structure ranking using a Pareto-optimality consensus method

BackgroundAccurate protein loop structure models are important to understand functions of many proteins. Identifying the native or near-native models by distinguishing them from the misfolded ones is a critical step in protein loop structure prediction.ResultsWe have developed a Pareto Optimal Consensus (POC) method, which is a consensus model ranking approach to integrate multiple knowledge- or physics-based scoring functions. The procedure of identifying the models of best quality in a model set includes: 1) identifying the models at the Pareto optimal front with respect to a set of scoring functions, and 2) ranking them based on the fuzzy dominance relationship to the rest of the models. We apply the POC method to a large number of decoy sets for loops of 4- to 12-residue in length using a functional space composed of several carefully-selected scoring functions: Rosetta, DOPE, DDFIRE, OPLS-AA, and a triplet backbone dihedral potential developed in our lab. Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of ~20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets. Compared to the individual scoring function yielding best selection accuracy in the decoy sets, the POC method yields 23%, 37%, and 64% less false positives in distinguishing the native conformation, indentifying a near-native model (RMSD < 0.5A from the native) as top-ranked, and selecting at least one near-native model in the top-5-ranked models, respectively. Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops. Furthermore, the POC method outperforms the other popularly-used consensus strategies in model ranking, such as rank-by-number, rank-by-rank, rank-by-vote, and regression-based methods.ConclusionsBy integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.

[1] Teresa Head-Gordon,et al. Improved Energy Selection of Nativelike Protein Loops from Loop Decoys. , 2008, Journal of chemical theory and computation.

[2] Willem Waegeman,et al. Learning to rank: a ROC-based graph-theoretic approach , 2009, 4OR.

[3] David Baker,et al. Ranking predicted protein structures with support vector regression , 2007, Proteins.

[4] R. Bruccoleri,et al. Ab initio loop modeling and its application to homology modeling. , 2000, Methods in molecular biology.

[5] C M Deane,et al. Improved protein loop prediction from sequence alone. , 2001, Protein engineering.

[6] L. Aravind,et al. Identification of the prokaryotic ligand-gated ion channels and their implications for the mechanisms and origins of animal Cys-loop ion channels , 2004, Genome Biology.

[7] Narayanan Eswar,et al. Protein structure modeling with MODELLER. , 2008, Methods in molecular biology.

[8] Baldomero Oliva,et al. A supersecondary structure library and search algorithm for modeling loops in protein structures , 2006, Nucleic acids research.

[9] R A Friesner,et al. Prediction of loop geometries using a generalized born model of solvation effects , 1999, Proteins.

[10] Song Liu,et al. Accurate and efficient loop selections by the DFIRE‐based all‐atom statistical potential , 2004, Protein science : a publication of the Protein Society.

[11] Cinque S. Soto,et al. Evaluating conformational free energies: The colony energy and its application to the problem of loop prediction , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[12] R. Friesner,et al. Generalized Born Model Based on a Surface Integral Formulation , 1998 .

[13] Oleg Y Dmitriev,et al. The rigid connecting loop stabilizes hairpin folding of the two helices of the ATP synthase subunit c , 2007, Protein science : a publication of the Protein Society.

[14] Barry Honig,et al. Loop modeling: Sampling, filtering, and scoring , 2007, Proteins.

[15] Katharina Morik,et al. About the non-convex optimization problem induced by non-positive semidefinite kernel learning , 2008, Adv. Data Anal. Classif..

[16] M. DePristo,et al. Ab initio construction of polypeptide fragments: Accuracy of loop decoy discrimination by an all‐atom statistical potential and the AMBER force field with the Generalized Born solvation model , 2003, Proteins.

[17] M. Mezei,et al. Prediction of protein loop structures using a local move Monte Carlo approach and a grid-based force field. , 2008, Protein engineering, design & selection : PEDS.

[18] A. Ben-Naim. STATISTICAL POTENTIALS EXTRACTED FROM PROTEIN STRUCTURES : ARE THESE MEANINGFUL POTENTIALS? , 1997 .

[19] I. Song,et al. Working Set Selection Using Second Order Information for Training Svm, " Complexity-reduced Scheme for Feature Extraction with Linear Discriminant Analysis , 2022 .

[20] S. Tosatto,et al. Application of MM/PBSA colony free energy to loop decoy discrimination: Toward correlation between energy and root mean square deviation , 2005, Protein science : a publication of the Protein Society.

[21] R. K. Ursem. Multi-objective Optimization using Evolutionary Algorithms , 2009 .

[22] K. Dill,et al. Statistical potentials extracted from protein structures: how accurate are they? , 1996, Journal of molecular biology.

[23] A C Martin,et al. Modeling antibody hypervariable loops: a combined algorithm. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[24] Richard A Friesner,et al. Prediction of Protein Loop Conformations using the AGBNP Implicit Solvent Model and Torsion Angle Sampling. , 2008, Journal of chemical theory and computation.

[25] Jinbo Xu,et al. Improving consensus contact prediction via server correlation reduction , 2009, BMC Structural Biology.

[26] A Godzik,et al. Knowledge-based potentials for protein folding: what can we learn from known protein structures? , 1996, Structure.

[27] Bostjan Kobe,et al. Structural Proteomics: High-Throughput Methods , 2008 .

[28] Shuichi Hirono,et al. Comparison of Consensus Scoring Strategies for Evaluating Computational Models of Protein-Ligand Complexes , 2006, J. Chem. Inf. Model..

[29] Lisa Yan,et al. LOOPER: a molecular mechanics-based algorithm for protein loop prediction. , 2008, Protein engineering, design & selection : PEDS.

[30] Richard Bonneau,et al. Ab initio protein structure prediction of CASP III targets using ROSETTA , 1999, Proteins.

[31] Shaomeng Wang,et al. How Does Consensus Scoring Work for Virtual Library Screening? An Idealized Computer Experiment , 2001, J. Chem. Inf. Comput. Sci..

[32] Yaohang Li,et al. Extensive exploration of conformational space improves Rosetta results for short protein domains. , 2008, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[33] Kai Zhu,et al. Toward better refinement of comparative models: Predicting loops in inexact environments , 2008, Proteins.

[34] W. L. Jorgensen,et al. Development and Testing of the OPLS All-Atom Force Field on Conformational Energetics and Properties of Organic Liquids , 1996 .

[35] Gloria Fuentes,et al. Prediction of protein loop geometries in solution , 2007, Proteins.

[36] D. Baker,et al. Modeling structurally variable regions in homologous proteins with rosetta , 2004, Proteins.

[37] David Baker,et al. Voltage sensor conformations in the open and closed states in ROSETTA structural models of K(+) channels. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[38] Charles L. Brooks,et al. Prediction of protein loop conformations using multiscale modeling methods with physical energy scoring functions , 2008, J. Comput. Chem..

[39] Cen Gao,et al. Scoring function accuracy for membrane protein structure prediction , 2007, Proteins.

[40] E. Coutsias,et al. Sub-angstrom accuracy in protein loop reconstruction by robotics-inspired conformational sampling , 2009, Nature Methods.

[41] B. Honig,et al. A hierarchical approach to all‐atom protein loop prediction , 2004, Proteins.

[42] Yaohang Li,et al. Backbone statistical potential from local sequence-structure interactions in protein loops. , 2010, The journal of physical chemistry. B.

[43] R. Clark,et al. Consensus scoring for ligand/protein interactions. , 2002, Journal of molecular graphics & modelling.

[44] S. Wodak,et al. Factors influencing the ability of knowledge-based potentials to identify native sequence-structure matches. , 1994, Journal of molecular biology.

[45] R. Friesner,et al. Long loop prediction using the protein local optimization program , 2006, Proteins.

[46] M. Koppen,et al. A fuzzy scheme for the ranking of multivariate data and its application , 2004, IEEE Annual Meeting of the Fuzzy Information, 2004. Processing NAFIPS '04..

[47] A. Sali,et al. Modeling of loops in protein structures , 2000, Protein science : a publication of the Protein Society.