Differentiable biology: using deep learning for biophysics-based and data-driven modeling of molecular mechanisms

Deep learning using neural networks relies on a class of machine-learnable models constructed using 'differentiable programs'. These programs can combine mathematical equations specific to a particular domain of natural science with general-purpose, machine-learnable components trained on experimental data. Such programs are having a growing impact on molecular and cellular biology. In this Perspective, we describe an emerging 'differentiable biology' in which phenomena ranging from the small and specific (for example, one experimental assay) to the broad and complex (for example, protein folding) can be modeled effectively and efficiently, often by exploiting knowledge about basic natural phenomena to overcome the limitations of sparse, incomplete and noisy data. By distilling differentiable biology into a small set of conceptual primitives and illustrative vignettes, we show how it can help to address long-standing challenges in integrating multimodal data from diverse experiments across biological scales. This promises to benefit fields as diverse as biophysics and functional genomics.

[1]  Panayiotis V Benos,et al.  Is there a code for protein-DNA recognition? Probab(ilistical)ly. . . , 2002, BioEssays : news and reviews in molecular, cellular and developmental biology.

[2]  J. Maurice Rojas,et al.  Practical conversion from torsion space to Cartesian space for in silico protein synthesis , 2005, J. Comput. Chem..

[3]  D. Lauffenburger,et al.  Physicochemical modelling of cell signalling pathways , 2006, Nature Cell Biology.

[4]  Jie J. Zheng,et al.  PDZ domains and their binding partners: structure, specificity, and modification , 2010, Cell Communication and Signaling.

[5]  Luis Serrano,et al.  Correlation of mRNA and protein in complex biological samples , 2009, FEBS letters.

[6]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[7]  J. Skolnick,et al.  Structural space of protein–protein interfaces is degenerate, close to complete, and highly connected , 2010, Proceedings of the National Academy of Sciences.

[8]  S. Ramaswamy,et al.  Systematic identification of genomic markers of drug sensitivity in cancer cells , 2012, Nature.

[9]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[10]  R. Ewing,et al.  Identifying novel protein complexes in cancer cells using epitope-tagging of endogenous human genes and affinity-purification mass spectrometry. , 2012, Journal of proteome research.

[11]  Jeremy L. Muhlich,et al.  Properties of cell death models calibrated and compared using Bayesian approaches , 2013, Molecular systems biology.

[12]  Nci Dream Community A community effort to assess and improve drug sensitivity prediction algorithms , 2014 .

[13]  John J. Irwin,et al.  ZINC 15 – Ligand Discovery for Everyone , 2015, J. Chem. Inf. Model..

[14]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[15]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[16]  Subhashini Venugopalan,et al.  Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. , 2016, JAMA.

[17]  Aaron S. Gajadhar,et al.  Early signaling dynamics of the epidermal growth factor receptor , 2016, Proceedings of the National Academy of Sciences.

[18]  Li Li,et al.  Bypassing the Kohn-Sham equations with machine learning , 2016, Nature Communications.

[19]  Carolina Wählby,et al.  Automated Training of Deep Convolutional Neural Networks for Cell Segmentation , 2017, Scientific Reports.

[20]  Marc Hafner,et al.  Alternative drug sensitivity metrics improve preclinical cancer pharmacogenomics , 2017, Nature Biotechnology.

[21]  Kara Dolinski,et al.  The BioGRID interaction database: 2017 update , 2016, Nucleic Acids Res..

[22]  Zhen Li,et al.  Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model , 2016, bioRxiv.

[23]  J S Smith,et al.  ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost , 2016, Chemical science.

[24]  Sarah Webb Deep learning for biology. , 2018 .

[25]  Lucas Pelkmans,et al.  Multiplexed protein maps link subcellular organization to cellular states , 2018, Science.

[26]  E Weinan,et al.  Deep Potential Molecular Dynamics: a scalable model with the accuracy of quantum mechanics , 2017, Physical review letters.

[27]  Karl F Freed,et al.  Trajectory-based training enables protein simulations with accurate folding and Boltzmann ensembles in cpu-hours , 2018, PLoS Comput. Biol..

[28]  Thomas Brox,et al.  U-Net: deep learning for cell counting, detection, and morphometry , 2018, Nature Methods.

[29]  Bayesian analysis of isothermal titration calorimetry for binding thermodynamics , 2018, PloS one.

[30]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[31]  Bonnie Berger,et al.  Enhancing Evolutionary Couplings with Deep Convolutional Neural Networks , 2017, Cell systems.

[32]  Vijay S. Pande,et al.  Solving the RNA design problem with reinforcement learning , 2018, PLoS Comput. Biol..

[33]  Mike Preuss,et al.  Planning chemical syntheses with deep neural networks and symbolic AI , 2017, Nature.

[34]  Hai Su,et al.  Pathologist-level interpretable whole-slide cancer diagnosis with deep learning , 2019, Nat. Mach. Intell..

[35]  Russ B. Altman,et al.  High precision protein functional site detection using 3D convolutional neural networks , 2018, Bioinform..

[36]  Tommi Vatanen,et al.  Structure-Based Function Prediction using Graph Convolutional Networks , 2019, bioRxiv.

[37]  R. W. Oei,et al.  Convolutional neural network for cell classification using microscope images of intracellular actin networks , 2019, PloS one.

[38]  Damian Szklarczyk,et al.  STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets , 2018, Nucleic Acids Res..

[39]  Mohammed AlQuraishi,et al.  ProteinNet: a standardized data set for machine learning of protein structure , 2019, BMC Bioinformatics.

[40]  Darren J. Burgess,et al.  Spatial transcriptomics coming of age , 2019, Nature Reviews Genetics.

[41]  Avanti Shrikumar,et al.  Base-resolution models of transcription factor binding reveal soft motif syntax , 2019, Nature Genetics.

[42]  Lu-Ming Duan,et al.  Machine learning meets quantum physics , 2019, Physics Today.

[43]  Hammad Qureshi,et al.  Translational AI and Deep Learning in Diagnostic Pathology , 2019, Front. Med..

[44]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[45]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[46]  Mohammed AlQuraishi End-to-end differentiable learning of protein structure , 2018, bioRxiv.

[47]  Fred A. Hamprecht,et al.  ilastik: interactive machine learning for (bio)image analysis , 2019, Nature Methods.

[48]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[49]  Daniel S. Weld,et al.  The challenge of crafting intelligible intelligence , 2018, Commun. ACM.

[50]  George M. Church,et al.  Unified rational protein engineering with sequence-based deep representation learning , 2019, Nature Methods.

[51]  Ao Li,et al.  A novel approach for drug response prediction in cancer cell lines via network representation learning , 2018, Bioinform..

[52]  D. Sculley,et al.  Using deep learning to annotate the protein universe , 2019, Nature Biotechnology.

[53]  Bin Li,et al.  Applications of machine learning in drug discovery and development , 2019, Nature Reviews Drug Discovery.

[54]  Alán Aspuru-Guzik,et al.  Deep learning enables rapid identification of potent DDR1 kinase inhibitors , 2019, Nature Biotechnology.

[55]  M. Bronstein,et al.  Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning , 2019, Nature Methods.

[56]  Torsten Schwede,et al.  Critical assessment of methods of protein structure prediction (CASP)—Round XIII , 2019, Proteins.

[57]  Jinbo Xu Distance-based protein folding powered by deep learning , 2019, Proceedings of the National Academy of Sciences.

[58]  Universal Differential Equations for Scientific Machine Learning , 2020, ArXiv.

[59]  Demis Hassabis,et al.  Improved protein structure prediction using potentials from deep learning , 2020, Nature.

[60]  C. Sander,et al.  CellBox: Interpretable Machine Learning for Perturbation Biology with Application to the Design of Cancer Combination Therapy. , 2020, Cell systems.

[61]  Pierre Machart,et al.  Realistic in silico generation and augmentation of single-cell RNA-seq data using generative adversarial networks , 2020, Nature Communications.

[62]  Joseph M. Paggi,et al.  Leveraging non-structural data to predict structures of protein–ligand complexes , 2020, bioRxiv.

[63]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[64]  Patrick Aloy,et al.  A reference map of the human binary protein interactome , 2020, Nature.

[65]  Nikhil Naik,et al.  ProGen: Language Modeling for Protein Generation , 2020, bioRxiv.

[66]  Piotr Dittwald,et al.  Computational planning of the synthesis of complex natural products , 2020, Nature.

[67]  Peter K. Sorger,et al.  BIOPHYSICAL PREDICTION OF PROTEIN-PEPTIDE INTERACTIONS AND SIGNALING NETWORKS USING MACHINE LEARNING , 2019, Nature Methods.

[68]  Jinbo Xu,et al.  Accurate Protein Function Prediction via Graph Attention Networks with Predicted Structure Information , 2021, bioRxiv.

[69]  Low-N protein engineering with data-efficient deep learning. , 2021, Nature methods.

[70]  Gyu Rie Lee,et al.  Accurate prediction of protein structures and interactions using a 3-track neural network , 2021, Science.

[71]  G. Church,et al.  Single-sequence protein structure prediction using language models from deep learning , 2021, bioRxiv.

[72]  B. Rost,et al.  Protein language model embeddings for fast, accurate, alignment-free protein structure prediction , 2021, bioRxiv.

[73]  Cédric R. Weber,et al.  A compact vocabulary of paratope-epitope interactions enables predictability of antibody-antigen binding. , 2021, Cell reports.

[74]  B. Berger,et al.  Learning the protein language: Evolution, structure, and function. , 2021, Cell systems.

[75]  Bryn C. Taylor,et al.  Structure-based protein function prediction using graph convolutional networks , 2021, Nature Communications.

[76]  Tom Sercu,et al.  Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences , 2021, Proceedings of the National Academy of Sciences.

[77]  Karsten M. Borgwardt,et al.  Biological network analysis with deep learning , 2020, Briefings Bioinform..

[78]  P. Sorger,et al.  Protein structure prediction by AlphaFold2: are attention and symmetries all you need? , 2021, Acta crystallographica. Section D, Structural biology.

[79]  Oriol Vinyals,et al.  Highly accurate protein structure prediction with AlphaFold , 2021, Nature.

[80]  John F. Canny,et al.  MSA Transformer , 2021, bioRxiv.