EvoEF2: accurate and fast energy function for computational protein design

MOTIVATION The accuracy and success rate of de novo protein design remain limited, mainly due to the parameter overfitting of current energy functions and their inability to discriminate incorrect designs from correct designs. RESULTS We developed an extended energy function, EvoEF2, for efficient de novo protein sequence design, based on a previously proposed physical energy function, EvoEF. Remarkably, EvoEF2 recovered 32.5%, 47.9% and 22.3% of all, core and surface residues for 148 test monomers, and was generally applicable to protein-protein interaction design, as it recapitulated 30.9%, 42.4%, 31.3% and 21.4% of all, core, interface and surface residues for 88 test dimers, significantly outperforming EvoEF on the native sequence recapitulation. We further used I-TASSER to evaluate the foldability of the 148 designed monomer sequences, where all of them were predicted to fold into structures with high fold- and atomic-level similarity to their corresponding native structures, as demonstrated by the fact that 87.8% of the predicted structures shared a root-mean-square-deviation less than 2 Å to their native counterparts. The study also demonstrated that the usefulness of physical energy functions is highly correlated with the parameter optimization processes, and EvoEF2, with parameters optimized using sequence recapitulation, is more suitable for computational protein sequence design than EvoEF, which was optimized on thermodynamic mutation data. AVAILABILITY The source code of EvoEF2 and the benchmark datasets are freely available at https://zhanglab.ccmb.med.umich.edu/EvoEF. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Yang Zhang,et al.  Computational protein design and large-scale assessment by I-TASSER structure assembly simulations. , 2011, Journal of molecular biology.

[2]  Pralay Mitra,et al.  Changing the Apoptosis Pathway through Evolutionary Protein Design. , 2019, Journal of molecular biology.

[3]  Roland L. Dunbrack,et al.  A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. , 2011, Structure.

[4]  Bruce Randall Donald,et al.  Protein Design Using Continuous Rotamers , 2012, PLoS Comput. Biol..

[5]  Wei Zheng,et al.  BindProfX: Assessing Mutation-Induced Binding Affinity Change by Protein Interface Profiles with Pseudo-Counts. , 2017, Journal of molecular biology.

[6]  S. L. Mayo,et al.  De novo protein design: fully automated sequence selection. , 1997, Science.

[7]  D. Baker,et al.  RosettaRemodel: A Generalized Framework for Flexible Backbone Protein Design , 2011, PloS one.

[8]  Jack Snoeyink,et al.  Scientific benchmarks for guiding macromolecular energy function improvement. , 2013, Methods in enzymology.

[9]  L. Serrano,et al.  Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. , 2002, Journal of molecular biology.

[10]  Nancy Wilkins-Diehr,et al.  XSEDE: Accelerating Scientific Discovery , 2014, Computing in Science & Engineering.

[11]  Amy E Keating,et al.  X‐ray vs. NMR structures as templates for computational protein design , 2009, Proteins.

[12]  Yang Zhang,et al.  The I-TASSER Suite: protein structure and function prediction , 2014, Nature Methods.

[13]  Cui Zhanhua,et al.  Protein subunit interfaces: heterodimers versus homodimers , 2005, Bioinformation.

[14]  Yang Zhang,et al.  Crystal structure of designed PX domain from cytokine-independent survival kinase and implications on evolution-based protein engineering. , 2015, Journal of structural biology.

[15]  Roland L. Dunbrack,et al.  proteins STRUCTURE O FUNCTION O BIOINFORMATICS Improved prediction of protein side-chain conformations with SCWRL4 , 2022 .

[16]  J. Shifman,et al.  Triathlon for energy functions: Who is the winner for design of protein–protein interactions? , 2011, Proteins.

[17]  Kehang Han,et al.  Systematic optimization model and algorithm for binding sequence selection in computational enzyme design , 2013, Protein science : a publication of the Protein Society.

[18]  D. Baker,et al.  Design of a Novel Globular Protein Fold with Atomic-Level Accuracy , 2003, Science.

[19]  Yang Zhang,et al.  EvoDesign: Designing Protein-Protein Binding Interactions Using Evolutionary Interface Profiles in Conjunction with an Optimized Physical Energy Function. , 2019, Journal of molecular biology.

[20]  Feng Ding,et al.  Emergence of Protein Fold Families through Rational Design , 2006, PLoS Comput. Biol..

[21]  Chen Yanover,et al.  Optimizing energy functions for protein–protein interface design , 2011, J. Comput. Chem..

[22]  D. Baker,et al.  An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes. , 2003, Journal of molecular biology.

[23]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[24]  Roland L. Dunbrack,et al.  The Rosetta all-atom energy function for macromolecular modeling and design , 2017, bioRxiv.

[25]  Yang Zhang,et al.  An Evolution-Based Approach to De Novo Protein Design and Case Study on Mycobacterium tuberculosis , 2013, PLoS Comput. Biol..

[26]  Yang Zhang,et al.  How significant is a protein structure similarity with TM-score = 0.5? , 2010, Bioinform..

[27]  D. Baker,et al.  Native protein sequences are close to optimal for their structures. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[28]  O. Ptitsyn,et al.  Empirical solvent‐mediated potentials hold for both intra‐molecular and inter‐molecular inter‐residue interactions , 1998, Protein science : a publication of the Protein Society.

[29]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[30]  Christopher T. Saunders,et al.  Recapitulation of protein family divergence using flexible backbone protein design. , 2005, Journal of molecular biology.

[31]  M. Karplus,et al.  CHARMM: A program for macromolecular energy, minimization, and dynamics calculations , 1983 .

[32]  Yang Zhang,et al.  Predicting the Effect of Mutations on Protein-Protein Binding Interactions through Structure-Based Interface Profiles , 2015, PLoS Comput. Biol..

[33]  Tanja Kortemme,et al.  Backbone flexibility in computational protein design. , 2009, Current opinion in biotechnology.

[34]  Yushan Zhu,et al.  Computational design of enzyme–ligand binding using a combined energy function and deterministic sequence optimization algorithm , 2015, Journal of Molecular Modeling.

[35]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.