iCFN: an efficient exact algorithm for multistate protein design

Motivation Multistate protein design addresses real‐world challenges, such as multi‐specificity design and backbone flexibility, by considering both positive and negative protein states with an ensemble of substates for each. It also presents an enormous challenge to exact algorithms that guarantee the optimal solutions and enable a direct test of mechanistic hypotheses behind models. However, efficient exact algorithms are lacking for multistate protein design. Results We have developed an efficient exact algorithm called interconnected cost function networks (iCFN) for multistate protein design. Its generic formulation allows for a wide array of applications such as stability, affinity and specificity designs while addressing concerns such as global flexibility of protein backbones. iCFN treats each substate design as a weighted constraint satisfaction problem (WCSP) modeled through a CFN; and it solves the coupled WCSPs using novel bounds and a depth‐first branch‐and‐bound search over a tree structure of sequences, substates, and conformations. When iCFN is applied to specificity design of a T‐cell receptor, a problem of unprecedented size to exact methods, it drastically reduces search space and running time to make the problem tractable. Moreover, iCFN generates experimentally‐agreeing receptor designs with improved accuracy compared with state‐of‐the‐art methods, highlights the importance of modeling backbone flexibility in protein design, and reveals molecular mechanisms underlying binding specificity. Availability and implementation https://shen‐lab.github.io/software/iCFN Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Bruce Tidor,et al.  Molecular mechanisms and design principles for promiscuous inhibitors to avoid drug resistance: Lessons learned from HIV‐1 protease inhibition , 2015, Proteins.

[2]  Bruce Randall Donald,et al.  Comets (Constrained Optimization of Multistate Energies by Tree Search): A Provable and Efficient Algorithm to Optimize Binding Affinity and Specificity with Respect to Sequence , 2015, RECOMB.

[3]  A R Leach,et al.  Exploring the conformational space of protein side chains using dead‐end elimination and the A* algorithm , 1998, Proteins.

[4]  Eric A. Althoff,et al.  Kemp elimination catalysts by computational enzyme design , 2008, Nature.

[5]  J. Hutton,et al.  GENETIC LINKAGE RELATIONSHIPS OF LOCI SPECIFYING DIFFERENTIATION ALLOANTIGENS IN THE MOUSE , 1972, Transplantation.

[6]  P. Kincade Formation of B lymphocytes in fetal and adult life. , 1981, Advances in immunology.

[7]  Amy E Keating,et al.  Multistate protein design using CLEVER and CLASSY. , 2013, Methods in enzymology.

[8]  M. Cooper,et al.  SUPPRESSION OF IMMUNOGLOBULIN CLASS SYNTHESIS IN MICE , 1972, The Journal of experimental medicine.

[9]  David T. Jones,et al.  De novo protein design using pairwise potentials and a genetic algorithm , 1994, Protein science : a publication of the Protein Society.

[10]  Johan Desmet,et al.  The dead-end elimination theorem and its use in protein side-chain positioning , 1992, Nature.

[11]  Bruce Randall Donald,et al.  Dead-End Elimination with Backbone Flexibility , 2007, ISMB/ECCB.

[12]  F. L. Owen Acquisition of diverse T-cell isotypes. , 1982, Immunology today.

[13]  Hong Cao,et al.  Testing the Substrate-Envelope Hypothesis with Designed Pairs of Compounds , 2013, ACS chemical biology.

[14]  D. Baker,et al.  Design of a Novel Globular Protein Fold with Atomic-Level Accuracy , 2003, Science.

[15]  Menachem Fromer,et al.  Dead‐end elimination for multistate protein design , 2007, J. Comput. Chem..

[16]  Pablo Gainza,et al.  Fast gap‐free enumeration of conformations and sequences for protein design , 2015, Proteins.

[17]  R. Goldstein Efficient rotamer elimination applied to protein side-chains and related spin glasses. , 1994, Biophysical journal.

[18]  H. Macdonald,et al.  Phenotypic and functional properties of murine thymocytes. I. Precursors of cytolytic T lymphocytes and interleukin 2-producing cells are all contained within a subpopulation of “mature” thymocytes as analyzed by monoclonal antibodies and flow microfluorometry , 1982, The Journal of experimental medicine.

[19]  Thomas Schiex,et al.  Solving weighted CSP by maintaining arc consistency , 2004, Artif. Intell..

[20]  Javier Larrosa,et al.  Node and arc consistency in weighted CSP , 2002, AAAI/IAAI.

[21]  R. Riblet,et al.  ACTIVE SUPPRESSION OF IMMUNOGLOBULIN ALLOTYPE SYNTHESIS II. TRANSFER OF SUPPRESSING FACTOR WITH SPLEEN CELLS , 1972 .

[22]  Simon de Givry,et al.  Cost function network‐based design of protein‐protein interactions: predicting changes in binding affinity , 2018, Bioinform..

[23]  Bruce Tidor,et al.  Computational design of antibody-affinity improvement beyond in vivo maturation , 2007, Nature Biotechnology.

[24]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[25]  Martin C. Cooper,et al.  Virtual Arc Consistency for Weighted CSP , 2008, AAAI.

[26]  Simon de Givry,et al.  Triangle-based consistencies for cost function networks , 2016, Constraints.

[27]  R. Pauwels,et al.  Differential effect of neonatal injections of anti-mu or anti-delta antibodies on the synthesis of IgM, IgD, IgE, IgA, IgG1, IgG2a, IgG2b, and IgG2c immunoglobulin classes. , 1978, Journal of immunology.

[28]  B. Taylor,et al.  The T suppressor cell alloantigen Tsud maps near immunoglobulin allotype genes and may be an heavy chain constant-region marker on a T cell receptor , 1981, The Journal of experimental medicine.

[29]  I. Lasters,et al.  Fast and accurate side‐chain topology and energy refinement (FASTER) as a new method for protein structure optimization , 2002, Proteins.

[30]  Menachem Fromer,et al.  A computational framework to empower probabilistic protein design , 2008, ISMB.

[31]  B. Kuhlman,et al.  Computational design of a single amino acid sequence that can switch between two distinct protein folds. , 2006, Journal of the American Chemical Society.

[32]  Yang Shen,et al.  Improved flexible refinement of protein docking in CAPRI rounds 22–27 , 2013, Proteins.

[33]  Thomas Schiex,et al.  Guaranteed Discrete Energy Optimization on Large Protein Design Problems. , 2015, Journal of chemical theory and computation.

[34]  A. Nisonoff,et al.  Quantitative investigations of idiotypic antibodies. VI. Idiotypic specificity as a potential genetic marker for the variable regions of mouse immunoglobulin polypeptide chains. , 1972 .

[35]  Jens Meiler,et al.  ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. , 2011, Methods in enzymology.

[36]  Rainer Merkl,et al.  Rosetta:MSF: a modular framework for multi-state computational protein design , 2017, PLoS Comput. Biol..

[37]  Tomás Lozano-Pérez,et al.  Rotamer optimization for protein design through MAP estimation and problem‐size reduction , 2009, J. Comput. Chem..

[38]  G. Petsko,et al.  Conformational substates in a protein: structure and dynamics of metmyoglobin at 80 K. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Mona Singh,et al.  Solving and analyzing side-chain positioning problems using linear and integer programming , 2005, Bioinform..

[40]  Mark A Hallen,et al.  Dead‐end elimination with perturbations (DEEPer): A provable protein design algorithm with continuous sidechain and backbone flexibility , 2013, Proteins.

[41]  Bruce Randall Donald,et al.  A Novel Minimized Dead-End Elimination Criterion and Its Application to Protein Redesign in a Hybrid Scoring and Search Algorithm for Computing Partition Functions over Molecular Ensembles , 2006, RECOMB.

[42]  P. Gottlieb,et al.  A mature T lymphocyte subpopulation marker closely linked to the Ig‐1 allotype CH locus , 1979, European journal of immunology.

[43]  Thomas Schiex,et al.  On the Complexity of Compact Coalitional Games , 2009, IJCAI.

[44]  Yuexin Wu,et al.  Computational Protein Design Using AND/OR Branch-and-Bound Search , 2015, RECOMB.

[45]  J. Hutton,et al.  Linkage groups of the theta and Ly-A loci. , 1971, Nature: New biology.

[46]  Pablo Gainza,et al.  Osprey: Protein Design with Ensembles, Flexibility, and Provable Algorithms , 2022 .

[47]  Yair Weiss,et al.  Approximate Inference and Protein-Folding , 2002, NIPS.

[48]  L. Herzenberg,et al.  Expression of IgD by murine lymphocytes. Loss of surface IgD indicates maturation of memory B cells , 1978, The Journal of experimental medicine.

[49]  Martin C. Cooper,et al.  Optimal Soft Arc Consistency , 2007, IJCAI.

[50]  Simon de Givry,et al.  Computational Protein Design as a Cost Function Network Optimization Problem , 2012, CP.

[51]  David Baker,et al.  Accurate design of megadalton-scale two-component icosahedral protein complexes , 2016, Science.

[52]  G. Spurll,et al.  A family of T-cell alloantigens linked to Igh-1 , 1981, Nature.

[53]  Jens Meiler,et al.  Design of Protein Multi-specificity Using an Independent Sequence Search Reduces the Barrier to Low Energy Sequences , 2015, PLoS Comput. Biol..

[54]  Roland L. Dunbrack,et al.  Backbone-dependent rotamer library for proteins. Application to side-chain prediction. , 1993, Journal of molecular biology.

[55]  Thomas Schiex,et al.  Valued Constraint Satisfaction Problems: Hard and Easy Problems , 1995, IJCAI.

[56]  P. Harbury,et al.  Automated design of specificity in molecular recognition , 2003, Nature Structural Biology.

[57]  Stephen L. Mayo,et al.  Conformational splitting: A more powerful criterion for dead-end elimination , 2000, J. Comput. Chem..

[58]  P Argos,et al.  A method to configure protein side-chains from the main-chain trace in homology modelling. , 1993, Journal of molecular biology.

[59]  Simon de Givry,et al.  Computational protein design as an optimization problem , 2014, Artif. Intell..

[60]  Gevorg Grigoryan,et al.  Design of protein-interaction specificity affords selective bZIP-binding peptides , 2009, Nature.

[61]  P. S. Kim,et al.  High-resolution protein design with backbone freedom. , 1998, Science.

[62]  C. Janeway,et al.  Absence of an antigen‐specific helper T cell required for the expression of the Tn 15 idiotype in mice treated with anti‐μ antibody , 1980, European journal of immunology.

[63]  H. Frauenfelder,et al.  Conformational substates in proteins. , 1988, Annual review of biophysics and biophysical chemistry.

[64]  G. Spurll,et al.  Tthyd, a new thymocyte alloantigen linked to Igh-1. Implications for a switch mechanism for T cell antigen receptors , 1982, The Journal of experimental medicine.

[65]  M. Nahm,et al.  Subclass restriction of murine antibodies. II. The IgG plaque-forming cell response to thymus-independent type 1 and type 2 antigens in normal mice and mice expressing an X-linked immunodeficiency , 1980, The Journal of experimental medicine.

[66]  Niles A Pierce,et al.  Protein design is NP-hard. , 2002, Protein engineering.

[67]  Simon de Givry,et al.  Existential arc consistency: Getting closer to full arc consistency in weighted CSPs , 2005, IJCAI.

[68]  Zhiping Weng,et al.  Computational Design of the Affinity and Specificity of a Therapeutic T Cell Receptor , 2014, PLoS Comput. Biol..

[69]  L. Herzenberg,et al.  T cell subsets defined by expression of Lyt-1,2,3 and Thy-1 antigens. Two-parameter immunofluorescence and cytotoxicity analysis with monoclonal antibodies modifies current views , 1980, The Journal of experimental medicine.

[70]  Bruce Randall Donald,et al.  Protein Design Using Continuous Rotamers , 2012, PLoS Comput. Biol..

[71]  J. Kearney,et al.  Ontogeny of Ia and IgD on IgM-bearing B lymphocytes in mice , 1977, The Journal of experimental medicine.

[72]  Eric A. Althoff,et al.  De Novo Computational Design of Retro-Aldol Enzymes , 2008, Science.

[73]  Andrew Leaver-Fay,et al.  A Generic Program for Multistate Protein Design , 2011, PloS one.

[74]  Marc De Maeyer,et al.  The “Dead-End Elimination” Theorem: A New Approach to the Side-Chain Packing Problem , 1994 .

[75]  D. Baker,et al.  Computational redesign of protein-protein interaction specificity , 2004, Nature Structural &Molecular Biology.

[76]  D. Benjamin Gordon,et al.  Radical performance enhancements for combinatorial optimization algorithms based on the dead-end elimination theorem , 1998, Journal of Computational Chemistry.

[77]  Simon de Givry,et al.  A new framework for computational protein design through cost function network optimization , 2013, Bioinform..