Computational Protein Design with Deep Learning Neural Networks

Computational protein design has a wide variety of applications. Despite its remarkable success, designing a protein for a given structure and function is still a challenging task. On the other hand, the number of solved protein structures is rapidly increasing while the number of unique protein folds has reached a steady number, suggesting more structural information is being accumulated on each fold. Deep learning neural network is a powerful method to learn such big data set and has shown superior performance in many machine learning fields. In this study, we applied the deep learning neural network approach to computational protein design for predicting the probability of 20 natural amino acids on each residue in a protein. A large set of protein structures was collected and a multi-layer neural network was constructed. A number of structural properties were extracted as input features and the best network achieved an accuracy of 38.3%. Using the network output as residue type restraints improves the average sequence identity in designing three natural proteins using Rosetta. Moreover, the predictions from our network show ~3% higher sequence identity than a previous method. Results from this study may benefit further development of computational protein design methods.

[1]  James G. Lyons,et al.  Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning , 2015, Scientific Reports.

[2]  Quan Chen,et al.  Computational protein design for given backbone: recent progresses in general method-related aspects. , 2016, Current opinion in structural biology.

[3]  Ole Winther,et al.  Protein Secondary Structure Prediction with Long Short Term Memory Networks , 2014, ArXiv.

[4]  David Baker,et al.  Accurate de novo design of hyperstable constrained peptides , 2016, Nature.

[5]  Zhen Li,et al.  Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model , 2016, bioRxiv.

[6]  Gevorg Grigoryan,et al.  Computational design and experimental characterization of peptides intended for pH-dependent membrane insertion and pore formation. , 2015, ACS chemical biology.

[7]  Alessandro Senes,et al.  De novo design and molecular assembly of a transmembrane diporphyrin-binding protein complex. , 2010, Journal of the American Chemical Society.

[8]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[9]  David Baker,et al.  Accurate design of megadalton-scale two-component icosahedral protein complexes , 2016, Science.

[10]  David Baker,et al.  Design of a hyperstable 60-subunit protein icosahedron , 2016, Nature.

[11]  K. Sharp,et al.  Potential energy functions for protein design. , 2007, Current opinion in structural biology.

[12]  François Stricher,et al.  How Protein Stability and New Functions Trade Off , 2008, PLoS Comput. Biol..

[13]  Andrew Leaver-Fay,et al.  Resource Computationally Designed Bispecific Antibodies using Negative State Repertoires Graphical Abstract Highlights , 2016 .

[14]  Vijay S. Pande,et al.  Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity , 2017, ArXiv.

[15]  Izhar Wallach,et al.  AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery , 2015, ArXiv.

[16]  Sepp Hochreiter,et al.  Toxicity Prediction using Deep Learning , 2015, ArXiv.

[17]  Gevorg Grigoryan,et al.  De novo design of a transmembrane Zn2+-transporting four-helix bundle , 2014, Science.

[18]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[19]  Navdeep Jaitly,et al.  Protein Secondary Structure Prediction Using Deep Multi-scale Convolutional Neural Networks and Next-Step Conditioning , 2016, ArXiv.

[20]  David Baker,et al.  Rational design of alpha-helical tandem repeat proteins with closed architectures , 2015, Nature.

[21]  Andrei L. Lomize,et al.  OPM: Orientations of Proteins in Membranes database , 2006, Bioinform..

[22]  David Baker,et al.  Accurate design of co-assembling multi-component protein nanomaterials , 2014, Nature.

[23]  Günter Klambauer,et al.  DeepTox: Toxicity Prediction using Deep Learning , 2016, Front. Environ. Sci..

[24]  D. Baker,et al.  Design of a Novel Globular Protein Fold with Atomic-Level Accuracy , 2003, Science.

[25]  Jiahai Zhang,et al.  Protein design with a comprehensive statistical energy function and boosted by experimental selection for foldability , 2014, Nature Communications.

[26]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[27]  Abhinav Vishnu,et al.  Deep learning for computational chemistry , 2017, J. Comput. Chem..

[28]  P. Argos,et al.  Knowledge‐based protein secondary structure assignment , 1995, Proteins.

[29]  Sophie Barbe,et al.  An Atomistic Statistically Effective Energy Function for Computational Protein Design. , 2016, Journal of chemical theory and computation.

[30]  R. Huber,et al.  Crystal structure determination, refinement and the molecular model of the alpha-amylase inhibitor Hoe-467A. , 1986, Journal of molecular biology.

[31]  D. Baker,et al.  The coming of age of de novo protein design , 2016, Nature.

[32]  Yuedong Yang,et al.  Direct prediction of profiles of sequences compatible with a protein structure by neural networks with fragment‐based local and energy‐based nonlocal profiles , 2014, Proteins.

[33]  Jianlin Cheng,et al.  Predicting protein residue-residue contacts using deep networks and boosting , 2012, Bioinform..

[34]  David Baker,et al.  Computational design of ligand-binding proteins with high affinity and selectivity , 2013, Nature.

[35]  Narayanaswamy Srinivasan,et al.  Protein sequence design and its applications. , 2016, Current opinion in structural biology.

[36]  Luhua Lai,et al.  Sequence-based prediction of protein protein interaction using a deep-learning algorithm , 2017, BMC Bioinformatics.

[37]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[38]  Lukasz A. Kurgan,et al.  SPINE X: Improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles , 2012, J. Comput. Chem..

[39]  Seong-Hwan Rho,et al.  Crystal structure and functional studies reveal that PAS factor from Vibrio vulnificus is a novel member of the saposin-fold family. , 2006, Journal of molecular biology.

[40]  D. Baker,et al.  Design of ordered two-dimensional arrays mediated by noncovalent protein-protein interfaces , 2015, Science.

[41]  Luhua Lai,et al.  Computational design of ligand-binding proteins. , 2017, Current opinion in structural biology.

[42]  D. Baker,et al.  Computational Design of Self-Assembling Protein Nanomaterials with Atomic Level Accuracy , 2012, Science.

[43]  Yaoqi Zhou,et al.  Energy functions in de novo protein design: current challenges and future prospects. , 2013, Annual review of biophysics.

[44]  Peng Xiong,et al.  Computational Protein Design Under a Given Backbone Structure with the ABACUS Statistical Energy Function. , 2017, Methods in molecular biology.

[45]  Juno Choe,et al.  Protein tolerance to random amino acid change. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[46]  Eric A. Althoff,et al.  Kemp elimination catalysts by computational enzyme design , 2008, Nature.

[47]  Hongyi Zhou,et al.  Distance‐scaled, finite ideal‐gas reference state improves structure‐derived potentials of mean force for structure selection and stability prediction , 2002, Protein science : a publication of the Protein Society.

[48]  Luhua Lai,et al.  Deep Learning for Drug-Induced Liver Injury , 2015, J. Chem. Inf. Model..

[49]  Jens Meiler,et al.  ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. , 2011, Methods in enzymology.

[50]  L. Stamatatos,et al.  Computational design of epitope-scaffolds allows induction of antibodies specific for a poorly immunogenic HIV vaccine epitope. , 2010, Structure.

[51]  Pierre Baldi,et al.  Deep architectures for protein contact map prediction , 2012, Bioinform..

[52]  Luhua Lai,et al.  A protein engineered to bind uranyl selectively and with femtomolar affinity. , 2014, Nature chemistry.

[53]  Jian Peng,et al.  Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields , 2015, Scientific Reports.

[54]  Jaap Heringa,et al.  Protein secondary structure prediction. , 2010, Methods in molecular biology.

[55]  David Baker,et al.  Proof of principle for epitope-focused vaccine design , 2014, Nature.

[56]  David Ryan Koes,et al.  Protein-Ligand Scoring with Convolutional Neural Networks , 2016, Journal of chemical information and modeling.

[57]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[58]  Sarel J Fleishman,et al.  Why reinvent the wheel? Building new proteins based on ready‐made parts , 2016, Protein science : a publication of the Protein Society.

[59]  Zhen Li,et al.  Protein Secondary Structure Prediction Using Cascaded Convolutional and Recurrent Neural Networks , 2016, IJCAI.

[60]  Andrew Leaver-Fay,et al.  Generation of bispecific IgG antibodies by structure-based design of an orthogonal Fab interface , 2014, Nature Biotechnology.

[61]  Roland L. Dunbrack,et al.  A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. , 2011, Structure.

[62]  Eric A. Althoff,et al.  De Novo Computational Design of Retro-Aldol Enzymes , 2008, Science.

[63]  Kyle Trainor,et al.  Using natural sequences and modularity to design common and novel protein topologies. , 2016, Current opinion in structural biology.

[64]  Ingemar André,et al.  Computational design of protein self-assembly. , 2016, Current opinion in structural biology.

[65]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[66]  Jiahai Zhang,et al.  Proteins of well-defined structures can be designed without backbone readjustment by a statistical model. , 2016, Journal of structural biology.

[67]  Ilan Samish,et al.  Achievements and Challenges in Computational Protein Design. , 2017, Methods in molecular biology.