Deep Learning in Protein Structural Modeling and Design

Summary Deep learning is catalyzing a scientific revolution fueled by big data, accessible toolkits, and powerful computational resources, impacting many fields, including protein structural modeling. Protein structural modeling, such as predicting structure from amino acid sequence and evolutionary information, designing proteins toward desirable functionality, or predicting properties or behavior of a protein, is critical to understand and engineer biological systems at the molecular level. In this review, we summarize the recent advances in applying deep learning techniques to tackle problems in protein structural modeling and design. We dissect the emerging approaches using deep learning techniques for protein structural modeling and discuss advances and challenges that must be addressed. We argue for the central importance of structure, following the “sequence → structure → function” paradigm. This review is directed to help both computational biologists to gain familiarity with the deep learning methods applied in protein modeling, and computer scientists to gain perspective on the biologically meaningful problems that may benefit from deep learning techniques.

[1]  Yoshua Bengio,et al.  Deep convolutional networks for quality assessment of protein folds , 2018, Bioinform..

[2]  David Becerra,et al.  Designing real novel proteins using deep graph neural networks , 2019 .

[3]  Li Li,et al.  Optimization of Molecules via Deep Reinforcement Learning , 2018, Scientific Reports.

[4]  Marcin J. Skwark,et al.  Improved Contact Predictions Using the Recognition of Protein Like Contact Patterns , 2014, PLoS Comput. Biol..

[5]  Debora S Marks,et al.  Deep generative models of genetic variation capture the effects of mutations , 2018, Nature Methods.

[6]  Jianyi Yang,et al.  Protein contact prediction using metagenome sequence data and residual neural networks , 2020, Bioinform..

[7]  Mohammed AlQuraishi,et al.  AlphaFold at CASP13 , 2019, Bioinform..

[8]  Roland L. Dunbrack,et al.  A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. , 2011, Structure.

[9]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[10]  Namrata Anand,et al.  Generative modeling for protein structures , 2018, NeurIPS.

[11]  Erik Cambria,et al.  Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..

[12]  Xiaohua Zhai,et al.  A Large-Scale Study on Regularization and Normalization in GANs , 2018, ICML.

[13]  James Zou,et al.  Feedback GAN (FBGAN) for DNA: a Novel Feedback-Loop Architecture for Optimizing Protein Functions , 2018, ArXiv.

[14]  Gao DEVELOPMENT OF A PROTEIN FOLDING ENVIRONMENT FOR REINFORCEMENT LEARNING , 2020 .

[15]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[16]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[17]  Slavica Jonic,et al.  Protein structure determination by electron cryo-microscopy. , 2009, Current opinion in pharmacology.

[18]  Martin Simonovsky,et al.  DeeplyTough: Learning Structural Comparison of Protein Binding Sites , 2019, bioRxiv.

[19]  Po-Ssu Huang,et al.  Protein sequence design with a learned potential , 2020, bioRxiv.

[20]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Ron O. Dror,et al.  Molecular Dynamics Simulation for All , 2018, Neuron.

[22]  David T Jones,et al.  Prediction of interresidue contacts with DeepMetaPSICOV in CASP13 , 2019, Proteins.

[23]  O. Stegle,et al.  Deep learning for computational biology , 2016, Molecular systems biology.

[24]  Jerzy Leszczynski,et al.  Accurate and transferable multitask prediction of chemical properties with an atoms-in-molecules neural network , 2018, Science Advances.

[25]  G. N. Ramachandran,et al.  Stereochemistry of polypeptide chain configurations. , 1963, Journal of molecular biology.

[26]  John Canny,et al.  Evaluating Protein Transfer Learning with TAPE , 2019, bioRxiv.

[27]  Jure Leskovec,et al.  Hierarchical Graph Representation Learning with Differentiable Pooling , 2018, NeurIPS.

[28]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[29]  C. Levinthal Are there pathways for protein folding , 1968 .

[30]  George M. Church,et al.  Unified rational protein engineering with sequence-only deep representation learning , 2019, bioRxiv.

[31]  Cameron Mura,et al.  Structural biology meets data science: does anything change? , 2018, Current opinion in structural biology.

[32]  Xinqiang Ding,et al.  Deciphering protein evolution and fitness landscapes with latent space models , 2019, Nature Communications.

[33]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[34]  C Venclovas,et al.  Processing and analysis of CASP3 protein structure predictions , 1999, Proteins.

[35]  G Schneider,et al.  The rational design of amino acid sequences by artificial neural networks and simulated molecular evolution: de novo design of an idealized leader peptidase cleavage site. , 1994, Biophysical journal.

[36]  Wenhao Gao,et al.  The Synthesizability of Molecules Proposed by Generative Models , 2020, J. Chem. Inf. Model..

[37]  Hans-Peter Kriegel,et al.  Protein function prediction via graph kernels , 2005, ISMB.

[38]  René L. Warren,et al.  Termin(A)ntor: Polyadenylation Site Prediction Using Deep Learning Models , 2019, bioRxiv.

[39]  Teresa Head-Gordon,et al.  New developments in force fields for biomolecular simulations. , 2018, Current opinion in structural biology.

[40]  Pieter P. Plehiers,et al.  A robotic platform for flow synthesis of organic compounds informed by AI planning , 2019, Science.

[41]  Gisbert Schneider,et al.  Recurrent Neural Network Model for Constructive Peptide Design , 2018, J. Chem. Inf. Model..

[42]  Pierre Vandergheynst,et al.  Geometric Deep Learning: Going beyond Euclidean data , 2016, IEEE Signal Process. Mag..

[43]  Frank Noé,et al.  Machine learning for protein folding and dynamics. , 2019, Current opinion in structural biology.

[44]  F. Arnold,et al.  Directed evolution: new parts and optimized function. , 2009, Current opinion in biotechnology.

[45]  Brendan J. Frey,et al.  Generating and designing DNA with deep generative models , 2017, ArXiv.

[46]  Kang Zhou,et al.  ProGAN: Protein solubility generative adversarial nets for data augmentation in DNN framework , 2019, Comput. Chem. Eng..

[47]  Mostafa Karimi,et al.  De Novo Protein Design for Novel Folds using Guided Conditional Wasserstein Generative Adversarial Networks (gcWGAN) , 2019, bioRxiv.

[48]  F. Jelinek,et al.  Perplexity—a measure of the difficulty of speech recognition tasks , 1977 .

[49]  Quoc V. Le,et al.  Chip Placement with Deep Reinforcement Learning , 2020, ArXiv.

[50]  James G. Lyons,et al.  SPIN2: Predicting sequence profiles from protein structures using deep neural networks , 2018, Proteins.

[51]  David Dohan,et al.  Model-based reinforcement learning for biological sequence design , 2020, ICLR.

[52]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[53]  Silvia Crivelli,et al.  Structural Learning of Proteins Using Graph Convolutional Neural Networks , 2019, bioRxiv.

[54]  David Baker,et al.  Computational design of ligand-binding proteins with high affinity and selectivity , 2013, Nature.

[55]  Yang Zhang,et al.  Deep‐learning contact‐map guided protein structure prediction in CASP13 , 2019, Proteins.

[56]  Jianyi Yang,et al.  Improved protein structure prediction using predicted interresidue orientations , 2019, Proceedings of the National Academy of Sciences.

[57]  Martin A. Nowak,et al.  Variational auto-encoding of protein sequences , 2017, ArXiv.

[58]  Mihaela van der Schaar,et al.  Distributed Learning for Stochastic Generalized Nash Equilibrium Problems , 2016, IEEE Transactions on Signal Processing.

[59]  Connor W. Coley,et al.  A graph-convolutional neural network model for the prediction of chemical reactivity , 2018, Chemical science.

[60]  Alex Fout,et al.  Protein Interface Prediction using Graph Convolutional Networks , 2017, NIPS.

[61]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[62]  Jeffrey J. Gray,et al.  Geometric potentials from deep learning improve prediction of CDR H3 loop structures , 2020, bioRxiv.

[63]  J S Smith,et al.  ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost , 2016, Chemical science.

[64]  Chen Chen,et al.  Challenges of glycosylation analysis and control: an integrated approach to producing optimal and consistent therapeutic drugs. , 2016, Drug discovery today.

[65]  Wei Li,et al.  RaptorX-Property: a web server for protein structure property prediction , 2016, Nucleic Acids Res..

[66]  Gisbert Schneider,et al.  Designing Anticancer Peptides by Constructive Machine Learning , 2018, ChemMedChem.

[67]  M. Levitt,et al.  Small libraries of protein fragments model native protein structures accurately. , 2002, Journal of molecular biology.

[68]  Marc A. Martí-Renom,et al.  Characterization of Protein Hubs by Inferring Interacting Motifs from Protein Interactions , 2007, PLoS Comput. Biol..

[69]  Mohammed AlQuraishi,et al.  End-to-end differentiable learning of protein structure , 2018, bioRxiv.

[70]  K-R Müller,et al.  SchNet - A deep learning architecture for molecules and materials. , 2017, The Journal of chemical physics.

[71]  Rafael Gómez-Bombarelli,et al.  Learning Coarse-Grained Particle Latent Space with Auto-Encoders , 2019 .

[72]  Regina Barzilay,et al.  Generative Models for Graph-Based Protein Design , 2019, DGS@ICLR.

[73]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[74]  G. Schneider,et al.  Peptide design by artificial neural networks and computer-based evolutionary search. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[75]  Mostafa Karimi,et al.  De Novo Protein Design for Novel Folds Using Guided Conditional Wasserstein Generative Adversarial Networks , 2020, J. Chem. Inf. Model..

[76]  Jure Leskovec,et al.  Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation , 2018, NeurIPS.

[77]  Zhaoyu Li,et al.  Protein Loop Modeling Using Deep Generative Adversarial Network , 2017, 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI).

[78]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[79]  David T. Jones,et al.  MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins , 2014, Bioinform..

[80]  David T Jones,et al.  Recent developments in deep learning applied to protein structure prediction , 2019, Proteins.

[81]  M. Bronstein,et al.  Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning , 2019, Nature Methods.

[82]  Alán Aspuru-Guzik,et al.  Inverse molecular design using machine learning: Generative models for matter engineering , 2018, Science.

[83]  Yifei Qi,et al.  DenseCPD: Improving the Accuracy of Neural-Network-Based Computational Protein Sequence Design with DenseNet , 2020, J. Chem. Inf. Model..

[84]  O. Lund,et al.  novel sequence representations Reliable prediction of T-cell epitopes using neural networks with , 2003 .

[85]  Anna Tramontano,et al.  Evaluation of residue–residue contact prediction in CASP10 , 2014, Proteins.

[86]  Reza Jafari,et al.  Solving the protein folding problem in hydrophobic-polar model using deep reinforcement learning , 2020, SN Applied Sciences.

[87]  Zhen Li,et al.  Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model , 2016, bioRxiv.

[88]  James Zou,et al.  Feedback GAN for DNA optimizes protein functions , 2019, Nature Machine Intelligence.

[89]  W. DeGrado,et al.  De novo protein design, a retrospective , 2020, Quarterly Reviews of Biophysics.

[90]  Zhenyu Zhou,et al.  Prediction of amino acid side chain conformation using a deep neural network , 2017, ArXiv.

[91]  Isaac Donnell,et al.  A structure-based deep learning framework for protein engineering , 2019, bioRxiv.

[92]  Amelie Stein,et al.  Improvements to Robotics-Inspired Conformational Sampling in Rosetta , 2013, PloS one.

[93]  Kuldip K. Paliwal,et al.  Accurate prediction of protein contact maps by coupling residual two-dimensional bidirectional long short-term memory with convolutional neural networks , 2018, Bioinform..

[94]  James M. Hogan,et al.  Distributed Representations for Biological Sequence Analysis , 2016, ArXiv.

[95]  A. Tramontano,et al.  Evaluation of residue–residue contact predictions in CASP9 , 2011, Proteins.

[96]  Jeffrey J. Gray,et al.  Big Data from Sparse Data: Diverse Scientific Benchmarks Reveal Optimization Imperatives for Implicit Membrane Energy Functions , 2020 .

[97]  Weile Jia,et al.  Deep Density: circumventing the Kohn-Sham equations via symmetry preserving neural networks , 2019, J. Comput. Phys..

[98]  Myle Ott,et al.  Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences , 2019, Proceedings of the National Academy of Sciences.

[99]  Cícero Nogueira dos Santos,et al.  PepCVAE: Semi-Supervised Targeted Design of Antimicrobial Peptide Sequences , 2018, 1810.07743.

[100]  Yingzhou Li,et al.  Universal approximation of symmetric and anti-symmetric functions , 2019, ArXiv.

[101]  Debora S. Marks,et al.  Accelerating Protein Design Using Autoregressive Generative Models , 2019, bioRxiv.

[102]  D. Baker,et al.  The coming of age of de novo protein design , 2016, Nature.

[103]  Phillip D. Zamore,et al.  RNA Interference , 2000, Science.

[104]  Michele Parrinello,et al.  Generalized neural-network representation of high-dimensional potential-energy surfaces. , 2007, Physical review letters.

[105]  Xiuwen Liu,et al.  ProDCoNN: Protein design using a convolutional neural network , 2019, Proteins.

[106]  David Baker,et al.  De novo protein design by deep network hallucination , 2020, Nature.

[107]  Klavs F. Jensen,et al.  Autonomous discovery in the chemical sciences part I: Progress , 2020, Angewandte Chemie.

[108]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[109]  Yuedong Yang,et al.  Direct prediction of profiles of sequences compatible with a protein structure by neural networks with fragment‐based local and energy‐based nonlocal profiles , 2014, Proteins.

[110]  Jianlin Cheng,et al.  Predicting protein residue-residue contacts using deep networks and boosting , 2012, Bioinform..

[111]  Jennifer Listgarten,et al.  Design by adaptive sampling , 2018, ArXiv.

[112]  Mathias Niepert,et al.  Learning Convolutional Neural Networks for Graphs , 2016, ICML.

[113]  Chris Bailey-Kellogg,et al.  Learning Context-aware Structural Representations to Predict Antigen and Antibody Binding Interfaces , 2019, bioRxiv.

[114]  Yu Li,et al.  Deep learning in bioinformatics: introduction, application, and perspective in big data era , 2019, bioRxiv.

[115]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[116]  Zhiyong Wang,et al.  Protein contact prediction by integrating joint evolutionary coupling analysis and supervised learning , 2013, Bioinform..

[117]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[118]  D. Baker,et al.  Computational Design of Self-Assembling Protein Nanomaterials with Atomic Level Accuracy , 2012, Science.

[119]  Brian Kuhlman,et al.  Advances in protein structure prediction and design , 2019, Nature Reviews Molecular Cell Biology.

[120]  David Baker,et al.  Protein sequence design by explicit energy landscape optimization , 2020, bioRxiv.

[121]  Zak Costello,et al.  How to Hallucinate Functional Proteins , 2019, 1903.00458.

[122]  Jan Peters,et al.  Deep Lagrangian Networks: Using Physics as Model Prior for Deep Learning , 2019, ICLR.

[123]  David T. Jones,et al.  High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features , 2018, Bioinform..

[124]  P. Dobson,et al.  Distinguishing enzyme structures from non-enzymes without alignments. , 2003, Journal of molecular biology.

[125]  D. Baker,et al.  Native protein sequences are close to optimal for their structures. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[126]  Regina Barzilay,et al.  Analyzing Learned Molecular Representations for Property Prediction , 2019, J. Chem. Inf. Model..

[127]  Renzhi Cao,et al.  Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13 , 2019, bioRxiv.

[128]  P Fariselli,et al.  Prediction of contact maps with neural networks and correlated mutations. , 2001, Protein engineering.

[129]  William R. Taylor,et al.  A ‘periodic table’ for protein structures , 2002, Nature.

[130]  A. Kolinski,et al.  Coarse-Grained Protein Models and Their Applications. , 2016, Chemical reviews.

[131]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[132]  Mathew J. Cherukara,et al.  A coarse-grained deep neural network model for liquid water , 2019, Applied Physics Letters.

[133]  Adrien Treuille,et al.  Predicting protein structures with a multiplayer online game , 2010, Nature.

[134]  Jie Hou,et al.  DeepSF: deep convolutional neural network for mapping protein sequences to folds , 2017, Bioinform..

[135]  Steve Renals,et al.  Multiplicative LSTM for sequence modelling , 2016, ICLR.

[136]  Linfeng Zhang,et al.  DeePCG: Constructing coarse-grained models via deep neural networks. , 2018, The Journal of chemical physics.

[137]  Saraswathi Vishveshwara,et al.  PROTEIN STRUCTURE: INSIGHTS FROM GRAPH THEORY , 2002 .

[138]  Frank Noé,et al.  Machine Learning of Coarse-Grained Molecular Dynamics Force Fields , 2018, ACS central science.

[139]  B. Rost,et al.  Predicted protein–protein interaction sites from local sequence information , 2003, FEBS letters.

[140]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[141]  Sheng Wang,et al.  RaptorX-Angle: real-value prediction of protein backbone dihedral angles through a hybrid method of clustering and deep learning , 2018, BMC Bioinformatics.

[142]  Roland L. Dunbrack,et al.  The Rosetta all-atom energy function for macromolecular modeling and design , 2017, bioRxiv.

[143]  The UniProt Consortium,et al.  UniProt: a worldwide hub of protein knowledge , 2018, Nucleic Acids Res..

[144]  Leszek Rychlewski,et al.  The challenge of protein structure determination—lessons from structural genomics , 2007, Protein science : a publication of the Protein Society.

[145]  Connor W. Coley,et al.  Autonomous discovery in the chemical sciences part II: Outlook , 2020, Angewandte Chemie.

[146]  Christopher N Rowley,et al.  Simulating protein–ligand binding with neural network potentials† , 2020, Chemical science.

[147]  Yuedong Yang,et al.  To Improve Protein Sequence Profile Prediction through Image Captioning on Pairwise Residue Distance Map , 2019, bioRxiv.

[148]  Jordan Graves,et al.  A Review of Deep Learning Methods for Antibodies , 2020, Antibodies.

[149]  M. Bronstein,et al.  Deciphering interaction fingerprints from protein molecular surfaces , 2019, bioRxiv.

[150]  George M. Church,et al.  Unified rational protein engineering with sequence-based deep representation learning , 2019, Nature Methods.

[151]  William S. DeWitt,et al.  Deep generative models for T cell receptor protein sequences , 2019, eLife.

[152]  Ehsaneddin Asgari,et al.  Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics , 2015, PloS one.

[153]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[154]  John Z. H. Zhang,et al.  Computational Protein Design with Deep Learning Neural Networks , 2018, Scientific Reports.

[155]  D. Baker,et al.  Design of a Novel Globular Protein Fold with Atomic-Level Accuracy , 2003, Science.

[156]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[157]  Jianyi Yang,et al.  Improved protein structure prediction using predicted interresidue orientations , 2020, Proceedings of the National Academy of Sciences.

[158]  Arpit Joshi,et al.  Generating protein sequences from antibiotic resistance genes data using Generative Adversarial Networks , 2019, ArXiv.

[159]  Ron O. Dror,et al.  Generalizable Protein Interface Prediction with End-to-End Learning , 2018, ArXiv.

[160]  Anne E Carpenter,et al.  Opportunities and obstacles for deep learning in biology and medicine , 2017, bioRxiv.

[161]  Pushmeet Kohli,et al.  Protein structure prediction using multiple deep neural networks in the 13th Critical Assessment of Protein Structure Prediction (CASP13) , 2019, Proteins.

[162]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[163]  Vikram Khipple Mulligan,et al.  De Novo Design of Bioactive Protein Switches , 2019, Nature.

[164]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[165]  Walter Thiel,et al.  QM/MM methods for biomolecular systems. , 2009, Angewandte Chemie.

[166]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[167]  A. Brunger Version 1.2 of the Crystallography and NMR system , 2007, Nature Protocols.

[168]  Adam Zemla,et al.  Critical assessment of methods of protein structure prediction (CASP)‐round V , 2005, Proteins.

[169]  Paris Perdikaris,et al.  Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations , 2019, J. Comput. Phys..

[170]  Xing-Ming Zhao,et al.  DeepPhos: prediction of protein phosphorylation sites with deep learning , 2019, Bioinform..

[171]  Geoffrey I. Webb,et al.  DeepCleave: a deep learning predictor for caspase and matrix metalloprotease substrates and cleavage sites , 2019, Bioinform..

[172]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[173]  Zhilong Wang,et al.  Combining the Fragmentation Approach and Neural Network Potential Energy Surfaces of Fragments for Accurate Calculation of Protein Energy. , 2020, The journal of physical chemistry. B.

[174]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[175]  Rob Fergus,et al.  Energy-based models for atomic-resolution protein conformations , 2020, ICLR.

[176]  Adrian E. Roitberg,et al.  Less is more: sampling chemical space with active learning , 2018, The Journal of chemical physics.

[177]  Gabriela Czibula,et al.  A Reinforcement Learning Model for Solving the Folding Problem , 2011 .

[178]  Alexander D. MacKerell,et al.  Optimization of the additive CHARMM all-atom protein force field targeting improved sampling of the backbone φ, ψ and side-chain χ(1) and χ(2) dihedral angles. , 2012, Journal of chemical theory and computation.

[179]  Albert Perez-Riba,et al.  Fast and flexible design of novel proteins using graph neural networks , 2019, bioRxiv.

[180]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[181]  Markus Gruber,et al.  CCMpred—fast and precise prediction of protein residue–residue contacts from correlated mutations , 2014, Bioinform..

[182]  Gianluca Pollastri,et al.  Deep learning methods in protein structure prediction , 2020, Computational and structural biotechnology journal.

[183]  Zhiyuan Liu,et al.  Graph Neural Networks: A Review of Methods and Applications , 2018, AI Open.

[184]  Aaron Bauer,et al.  De novo protein design by citizen scientists , 2019, Nature.

[185]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[186]  Zhen Li,et al.  Predicting membrane protein contacts from non-membrane proteins by deep transfer learning , 2017, ArXiv.

[187]  Graziano Pesole,et al.  Correlated substitution analysis and the prediction of amino acid structural contacts , 2007, Briefings Bioinform..

[188]  Pierre Baldi,et al.  Deep architectures for protein contact map prediction , 2012, Bioinform..

[189]  Yuanqing Wang,et al.  End-to-End Differentiable Molecular Mechanics Force Field Construction , 2020, ArXiv.

[190]  U. Singh,et al.  A NEW FORCE FIELD FOR MOLECULAR MECHANICAL SIMULATION OF NUCLEIC ACIDS AND PROTEINS , 1984 .

[191]  Jian Peng,et al.  Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields , 2015, Scientific Reports.

[192]  Silvio C. E. Tosatto,et al.  The Pfam protein families database in 2019 , 2018, Nucleic Acids Res..

[193]  Zhiheng Li,et al.  Graph Neural Network Based Coarse-Grained Mapping Prediction , 2020, ArXiv.

[194]  Benny Lautrup,et al.  A novel approach to prediction of the 3‐dimensional structures of protein backbones by neural networks , 1990, NIPS.

[195]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[196]  Namrata Anand,et al.  Ig-VAE: Generative modeling of protein structure by direct 3D coordinate generation , 2020, bioRxiv.

[197]  S. L. Mayo,et al.  De novo protein design: fully automated sequence selection. , 1997, Science.

[198]  Frank Noé,et al.  Machine learning for molecular simulation , 2019, Annual review of physical chemistry.

[199]  Andrew D. White,et al.  Iterative Peptide Modeling With Active Learning And Meta-Learning , 2019, ArXiv.

[200]  Benjamin T. Porebski,et al.  Consensus protein design , 2016, Protein engineering, design & selection : PEDS.

[201]  Jennifer Listgarten,et al.  Conditioning by adaptive sampling for robust design , 2019, ICML.

[202]  Jason Yosinski,et al.  Hamiltonian Neural Networks , 2019, NeurIPS.

[203]  Alexandre Tkatchenko,et al.  Quantum-chemical insights from deep tensor neural networks , 2016, Nature Communications.

[204]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP)—round IX , 2011, Proteins.

[205]  Zhaoyu Li,et al.  New Deep Learning Methods for Protein Loop Modeling , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[206]  Arne Elofsson,et al.  The TOPCONS web server for consensus prediction of membrane protein topology and signal peptides , 2015, Nucleic Acids Res..

[207]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[208]  Akshay Deepak,et al.  Deep Robust Framework for Protein Function Prediction Using Variable-Length Protein Sequences , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[209]  Russ B. Altman,et al.  Graph Convolutional Neural Networks for Predicting Drug-Target Interactions , 2018, bioRxiv.

[210]  Steven M Lewis,et al.  Molprobity's ultimate rotamer‐library distributions for model validation , 2016, Proteins.

[211]  Gevorg Grigoryan,et al.  De novo design of a transmembrane Zn2+-transporting four-helix bundle , 2014, Science.

[212]  Michael I. Jordan Serial Order: A Parallel Distributed Processing Approach , 1997 .

[213]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[214]  Jens Meiler,et al.  Finding the needle in the haystack: towards solving the protein-folding problem computationally , 2018, Critical reviews in biochemistry and molecular biology.

[215]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[216]  G. Hannon RNA interference : RNA , 2002 .

[217]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP)—Round XII , 2018, Proteins.

[218]  Aleksej Zelezniak,et al.  Expanding functional protein sequence space using generative adversarial networks , 2019, bioRxiv.

[219]  David Baker,et al.  Proof of principle for epitope-focused vaccine design , 2014, Nature.

[220]  Kevin K. Yang,et al.  Machine-learning-guided directed evolution for protein engineering , 2018, Nature Methods.

[221]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[222]  Jie Li,et al.  3D representations of amino acids—applications to protein sequence comparison and classification , 2014, Computational and structural biotechnology journal.

[223]  Frances H. Arnold,et al.  Machine learning to design integral membrane channelrhodopsins for efficient eukaryotic expression and plasma membrane localization , 2017, PLoS Comput. Biol..

[224]  Demis Hassabis,et al.  Improved protein structure prediction using potentials from deep learning , 2020, Nature.

[225]  Anand Kannan,et al.  Accurate Protein Structure Prediction by Embeddings and Deep Learning Representations , 2019, ArXiv.

[226]  Abd El Rahman Shabayek,et al.  Deep Learning Advances on Different 3D Data Representations: A Survey , 2018, ArXiv.

[227]  Markus J Buehler,et al.  A Self-Consistent Sonification Method to Translate Amino Acid Sequences into Musical Compositions and Application in Protein Design Using Artificial Intelligence. , 2019, ACS nano.

[228]  Marwin H. S. Segler,et al.  GuacaMol: Benchmarking Models for De Novo Molecular Design , 2018, J. Chem. Inf. Model..

[229]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[230]  Michal Linial,et al.  ProFET: Feature engineering captures high-level protein functions , 2015, Bioinform..

[231]  Günter Klambauer,et al.  DeepTox: Toxicity Prediction using Deep Learning , 2016, Front. Environ. Sci..

[232]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[233]  Andreas Krause,et al.  Navigating the protein fitness landscape with Gaussian processes , 2012, Proceedings of the National Academy of Sciences.

[234]  Johan Desmet,et al.  The dead-end elimination theorem and its use in protein side-chain positioning , 1992, Nature.

[235]  David T. Jones,et al.  Design of metalloproteins and novel protein folds using variational autoencoders , 2018, Scientific Reports.

[236]  W. T. ASTBURY,et al.  Structure of Proteins , 1939, Nature.

[237]  Burkhard Rost,et al.  Modeling aspects of the language of life through transfer-learning protein sequences , 2019, BMC Bioinformatics.

[238]  Qing Wu,et al.  ComplexContact: a web server for inter-protein contact prediction using deep learning , 2018, Nucleic Acids Res..

[239]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[240]  Torsten Schwede,et al.  Critical assessment of methods of protein structure prediction (CASP)—Round XIII , 2019, Proteins.

[241]  Michael S. Lew,et al.  Deep learning for visual understanding: A review , 2016, Neurocomputing.

[242]  Been Kim,et al.  Sanity Checks for Saliency Maps , 2018, NeurIPS.

[243]  Michael Nilges,et al.  Structural Biology by NMR: Structure, Dynamics, and Interactions , 2008, PLoS Comput. Biol..

[244]  Diederik P. Kingma,et al.  An Introduction to Variational Autoencoders , 2019, Found. Trends Mach. Learn..

[245]  Roland L. Dunbrack,et al.  proteins STRUCTURE O FUNCTION O BIOINFORMATICS Improved prediction of protein side-chain conformations with SCWRL4 , 2022 .

[246]  E Weinan,et al.  Deep Potential Molecular Dynamics: a scalable model with the accuracy of quantum mechanics , 2017, Physical review letters.

[247]  Jinbo Xu,et al.  Analysis of distance-based protein structure prediction by deep learning in CASP13 , 2019, bioRxiv.

[248]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[249]  Sari Sabban,et al.  RamaNet: Computational de novo helical protein backbone design using a long short-term memory generative neural network , 2019, bioRxiv.

[250]  X. Chen,et al.  SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence , 2003, Nucleic Acids Res..

[251]  Ruoyu Sun,et al.  Optimization for deep learning: theory and algorithms , 2019, ArXiv.

[252]  Zachary Wu,et al.  Learned protein embeddings for machine learning , 2018, Bioinformatics.

[253]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[254]  Michael H Hecht,et al.  A de novo enzyme catalyzes a life-sustaining reaction in Escherichia coli. , 2018, Nature chemical biology.

[255]  Namrata Anand,et al.  Fully differentiable full-atom protein backbone generation , 2019, DGS@ICLR.

[256]  Markus Meuwly,et al.  PhysNet: A Neural Network for Predicting Energies, Forces, Dipole Moments, and Partial Charges. , 2019, Journal of chemical theory and computation.

[257]  Michael H. Hecht,et al.  De Novo Designed Proteins from a Library of Artificial Sequences Function in Escherichia Coli and Enable Cell Growth , 2011, PloS one.

[258]  Faiza Hanif Waghu,et al.  CAMP: Collection of sequences and structures of antimicrobial peptides , 2013, Nucleic Acids Res..

[259]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).