Lattice protein design using Bayesian learning

Protein design is the inverse approach of the three-dimensional (3D) structure prediction for elucidating the relationship between the 3D structures and amino acid sequences. In general, the computation of the protein design involves a double loop: A loop for amino acid sequence changes and a loop for an exhaustive conformational search for each amino acid sequence. Herein, we propose a novel statistical mechanical design method using Bayesian learning, which can design lattice proteins without the exhaustive conformational search. We consider a thermodynamic hypothesis of the evolution of proteins and apply it to the prior distribution of amino acid sequences. Furthermore, we take the water effect into account in view of the grand canonical picture. As a result, on applying the 2D lattice hydrophobic-polar (HP) model, our design method successfully finds an amino acid sequence for which the target conformation has a unique ground state. However, the performance was not as good for the 3D lattice HP models compared to the 2D models. The performance of the 3D model improves on using a 20-letter lattice proteins. Furthermore, we find a strong linearity between the chemical potential of water and the number of surface residues, thereby revealing the relationship between protein structure and the effect of water molecules. The advantage of our method is that it greatly reduces computation time, because it does not require long calculations for the partition function corresponding to an exhaustive conformational search. As our method uses a general form of Bayesian learning and statistical mechanics and is not limited to lattice proteins, the results presented here elucidate some heuristics used successfully in previous protein design methods.

[1]  I. Coluzza,et al.  Proteins Are Solitary! Pathways of Protein Folding and Aggregation in Protein Mixtures. , 2019, The journal of physical chemistry letters.

[2]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[3]  D. Baker,et al.  Principles for designing ideal protein structures , 2012, Nature.

[4]  I. Coluzza Transferable Coarse-Grained Potential for De Novo Protein Folding and Design , 2014, PloS one.

[5]  John Z. H. Zhang,et al.  Computational Protein Design with Deep Learning Neural Networks , 2018, Scientific Reports.

[6]  Vladimir N Uversky,et al.  Intrinsically disordered proteins in overcrowded milieu: Membrane-less organelles, phase separation, and intrinsic disorder. , 2017, Current opinion in structural biology.

[7]  I. Coluzza A Coarse-Grained Approach to Protein Design: Learning from Design to Understand Folding , 2011, PloS one.

[8]  Gisbert Schneider,et al.  Automated de novo molecular design by hybrid machine intelligence and rule-driven chemical synthesis , 2019, Nature Machine Intelligence.

[9]  I. Coluzza,et al.  In silico evidence that protein unfolding is as a precursor of the protein aggregation. , 2019, Chemphyschem : a European journal of chemical physics and physical chemistry.

[10]  Simona Cocco,et al.  Inverse statistical physics of protein sequences: a key issues review , 2017, Reports on progress in physics. Physical Society.

[11]  Zhecan Wang,et al.  Learning Visual Commonsense for Robust Scene Graph Generation , 2020, ECCV.

[12]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[13]  이주현,et al.  7 , 1871, Testament d'un patriote exécuté.

[14]  A. Rossi,et al.  A novel iterative strategy for protein design , 1999, cond-mat/9910005.

[15]  H. Dyson,et al.  Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. , 1999, Journal of molecular biology.

[16]  Hervé Philippe,et al.  BMC Bioinformatics BioMed Central Methodology article A maximum likelihood framework for protein design , 2006 .

[17]  Hao Li,et al.  Designability of protein structures: A lattice‐model study using the Miyazawa‐Jernigan matrix , 2002, Proteins.

[18]  A. Minelli BIO , 2009, Evolution & Development.

[19]  Flavio Seno,et al.  Geometry and symmetry presculpt the free-energy landscape of proteins. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[20]  K. Dill,et al.  RNA folding energy landscapes. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Yuedong Yang,et al.  Direct prediction of profiles of sequences compatible with a protein structure by neural networks with fragment‐based local and energy‐based nonlocal profiles , 2014, Proteins.

[22]  R. Hodges,et al.  Relationship of sidechain hydrophobicity and α‐helical propensity on the stability of the single‐stranded amphipathic α‐helix , 1995 .

[23]  Yi Liu,et al.  RosettaDesign server for protein design , 2006, Nucleic Acids Res..

[24]  K Yue,et al.  Forces of tertiary structural organization in globular proteins. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Demis Hassabis,et al.  Improved protein structure prediction using potentials from deep learning , 2020, Nature.

[26]  Brian D. Weitzner,et al.  De novo design of potent and selective mimics of IL-2 and IL-15 , 2019, Nature.

[27]  K. Dill,et al.  A lattice statistical mechanics model of the conformational and sequence spaces of proteins , 1989 .

[28]  Carlo Camilloni,et al.  Mapping the Protein Fold Universe Using the CamTube Force Field in Molecular Dynamics Simulations , 2015, PLoS Comput. Biol..

[29]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[30]  S. L. Mayo,et al.  De novo protein design: fully automated sequence selection. , 1997, Science.

[31]  David Baker,et al.  Accurate design of megadalton-scale two-component icosahedral protein complexes , 2016, Science.

[32]  Why Do Proteins Look Like Proteins? , 1996, cond-mat/9603016.

[33]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[34]  Jinming Zou,et al.  Using self-consistent fields to bias Monte Carlo methods with applications to designing and sampling protein sequences , 2003 .

[35]  Carl Troein,et al.  Mutation-induced fold switching among lattice proteins. , 2011, The Journal of chemical physics.

[36]  Gisbert Schneider,et al.  De Novo Design of Bioactive Small Molecules by Artificial Intelligence , 2018, Molecular informatics.

[37]  Sanne Abeln,et al.  Disordered Flanks Prevent Peptide Aggregation , 2008, PLoS Comput. Biol..

[38]  David T. Jones,et al.  Design of metalloproteins and novel protein folds using variational autoencoders , 2018, Scientific Reports.

[39]  T. Kurosky,et al.  Design of copolymeric materials , 1995 .

[40]  Timothy A. Whitehead,et al.  Computational Design of Proteins Targeting the Conserved Stem Region of Influenza Hemagglutinin , 2011, Science.

[41]  D. Baker,et al.  Design of a Novel Globular Protein Fold with Atomic-Level Accuracy , 2003, Science.

[42]  Marcos R. Betancourt,et al.  Protein Sequence Design by Energy Landscaping , 2002 .

[43]  Sanne Abeln,et al.  A Simple Lattice Model That Captures Protein Folding, Aggregation and Amyloid Formation , 2014, PloS one.

[44]  O. Dym,et al.  Thermal stabilization of the protozoan Entamoeba histolytica alcohol dehydrogenase by a single proline substitution , 2008 .

[45]  E. Shakhnovich,et al.  Engineering of stable and fast-folding sequences of model proteins. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[46]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[47]  M. Kikuchi,et al.  Structural flexibility of intrinsically disordered proteins induces stepwise target recognition. , 2013, The Journal of chemical physics.

[48]  Petra Schneider,et al.  Generative Recurrent Networks for De Novo Drug Design , 2017, Molecular informatics.

[49]  D. Baker,et al.  RosettaRemodel: A Generalized Framework for Flexible Backbone Protein Design , 2011, PloS one.

[50]  C. Dellago,et al.  The role of directional interactions in the designability of generalized heteropolymers , 2017, Scientific Reports.

[51]  James G. Lyons,et al.  SPIN2: Predicting sequence profiles from protein structures using deep neural networks , 2018, Proteins.