Global-Context Aware Generative Protein Design

The linear sequence of amino acids determines protein structure and function. Protein design, known as the inverse of protein structure prediction, aims to obtain a novel protein sequence that will fold into the defined structure. Recent works on computational protein design have studied designing sequences for the desired backbone structure with local positional information and achieved competitive performance. However, similar local environments in different backbone structures may result in different amino acids, which indicates the global context of protein structure matters. Thus, we propose the G lobal- C ontext A ware generative de novo protein design method (GCA), consisting of local modules and global modules. While local modules focus on relationships between neighbor amino acids, global modules explicitly capture non-local contexts. Experimental results demonstrate that the proposed GCA method achieves state-of-the-art performance on structure-based protein design. Our code and pretrained model have been released on Github 1 .

[1]  Radka Svobodová Vareková,et al.  Mol* Viewer: modern web app for 3D visualization and analysis of large biomolecular structures , 2021, Nucleic Acids Res..

[2]  Vijil Chenthamarakshan,et al.  Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design , 2021, ICML.

[3]  Raphael J. L. Townshend,et al.  Learning from Protein Structure with Geometric Vector Perceptrons , 2020, ICLR.

[4]  Albert Perez-Riba,et al.  Fast and Flexible Protein Design Using Deep Graph Neural Networks. , 2020, Cell systems.

[5]  Jeffrey J. Gray,et al.  Deep Learning in Protein Structural Modeling and Design , 2020, Patterns.

[6]  Nikhil Naik,et al.  ProGen: Language Modeling for Protein Generation , 2020, bioRxiv.

[7]  Iryna Gurevych,et al.  Modeling Global and Local Node Contexts for Text Generation from Knowledge Graphs , 2020, Transactions of the Association for Computational Linguistics.

[8]  Xiuwen Liu,et al.  ProDCoNN: Protein design using a convolutional neural network , 2019, Proteins.

[9]  Yuedong Yang,et al.  To Improve Protein Sequence Profile Prediction through Image Captioning on Pairwise Residue Distance Map. , 2019, Journal of chemical information and modeling.

[10]  Torsten Schwede,et al.  Critical assessment of methods of protein structure prediction (CASP)—Round XIII , 2019, Proteins.

[11]  Vikram Khipple Mulligan,et al.  De Novo Design of Bioactive Protein Switches , 2019, Nature.

[12]  Regina Barzilay,et al.  Generative Models for Graph-Based Protein Design , 2019, DGS@ICLR.

[13]  F. Arnold,et al.  Machine-learning-guided directed evolution for protein engineering , 2018, Nature Methods.

[14]  James G. Lyons,et al.  SPIN2: Predicting sequence profiles from protein structures using deep neural networks , 2018, Proteins.

[15]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[16]  D. Baker,et al.  The coming of age of de novo protein design , 2016, Nature.

[17]  David Baker,et al.  Exploring the repeat protein universe through computational protein design , 2015, Nature.

[18]  Yuedong Yang,et al.  Direct prediction of profiles of sequences compatible with a protein structure by neural networks with fragment‐based local and energy‐based nonlocal profiles , 2014, Proteins.