Deciphering interaction fingerprints from protein molecular surfaces

Predicting interactions between proteins and other biomolecules purely based on structure is an unsolved problem in biology. A high-level description of protein structure, the molecular surface, displays patterns of chemical and geometric features that fingerprint a protein’s modes of interactions with other biomolecules. We hypothesize that proteins performing similar interactions may share common fingerprints, independent of their evolutionary history. Fingerprints may be difficult to grasp by visual analysis but could be learned from large-scale datasets. We present MaSIF, a conceptual framework based on a new geometric deep learning method to capture fingerprints that are important for specific biomolecular interactions. We showcase MaSIF with three prediction challenges: protein pocket-ligand prediction, protein-protein interaction site prediction, and ultrafast scanning of protein surfaces for prediction of protein-protein complexes. We anticipate that our conceptual framework will lead to improvements in our understanding of protein function and design.

[1]  Charles R. Cantor,et al.  Annual Review of Biophysics and Bioengineering , 1972 .

[2]  F M Richards,et al.  Areas, volumes, packing and protein structure. , 1977, Annual review of biophysics and bioengineering.

[3]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[4]  T. Pollard,et al.  Annual review of biophysics and biophysical chemistry , 1985 .

[5]  K. Sharp,et al.  Electrostatic interactions in macromolecules: theory and applications. , 1990, Annual review of biophysics and biophysical chemistry.

[6]  Andrea J. van Doorn,et al.  Surface shape and curvature scales , 1992, Image Vis. Comput..

[7]  M. Sanner,et al.  Reduced surface: an efficient way to compute molecular surfaces. , 1996, Biopolymers.

[8]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[9]  S. Jones,et al.  Prediction of protein-protein interaction sites using patch analysis. , 1997, Journal of molecular biology.

[10]  J. Richardson,et al.  Asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation. , 1999, Journal of molecular biology.

[11]  Nathan A. Baker,et al.  Electrostatics of nanosystems: Application to microtubules and the ribosome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Ruth Nussinov,et al.  Efficient Unbound Docking of Rigid Molecules , 2002, WABI.

[13]  D. Baker,et al.  An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes. , 2003, Journal of molecular biology.

[14]  D. Baker,et al.  Computational redesign of protein-protein interaction specificity , 2004, Nature Structural &Molecular Biology.

[15]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[16]  Rita Casadio,et al.  Algorithms in Bioinformatics, 5th International Workshop, WABI 2005, Mallorca, Spain, October 3-6, 2005, Proceedings , 2005, WABI.

[17]  Aleksey A. Porollo,et al.  Prediction‐based fingerprints of protein–protein interactions , 2006, Proteins.

[18]  Gerhard Klebe,et al.  PDB2PQR: expanding and upgrading automated preparation of biomolecular structures for molecular simulations , 2007, Nucleic Acids Res..

[19]  Hye-Kyung Kim,et al.  Crystal structure of D-erythronate-4-phosphate dehydrogenase complexed with NAD. , 2007, Journal of molecular biology.

[20]  Johannes C. Hermann,et al.  Structure-based activity prediction for an enzyme of unknown function , 2007, Nature.

[21]  Daisuke Kihara,et al.  Protein-protein docking using region-based 3D Zernike descriptors , 2009, BMC Bioinformatics.

[22]  Shuangye Yin,et al.  Fast screening of protein surfaces using geometric invariant fingerprints , 2009, Proceedings of the National Academy of Sciences.

[23]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[24]  Timothy A. Whitehead,et al.  Computational Design of Proteins Targeting the Conserved Stem Region of Influenza Hemagglutinin , 2011, Science.

[25]  Jens Meiler,et al.  ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. , 2011, Methods in enzymology.

[26]  Bruce Randall Donald,et al.  Algorithms in Structural Molecular Biology , 2011 .

[27]  Daisuke Kihara,et al.  Molecular surface representation using 3D Zernike descriptors for protein shape comparison and docking. , 2011, Current protein & peptide science.

[28]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[29]  B. Honig,et al.  Structure-based prediction of protein-protein interactions on a genome-wide scale , 2012, Nature.

[30]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31]  D. Baker,et al.  Computational Design of Self-Assembling Protein Nanomaterials with Atomic Level Accuracy , 2012, Science.

[32]  David Baker,et al.  Computational design of ligand-binding proteins with high affinity and selectivity , 2013, Nature.

[33]  M. Fleming,et al.  The Crystal Structure of Six-transmembrane Epithelial Antigen of the Prostate 4 (Steap4), a Ferri/Cuprireductase, Suggests a Novel Interdomain Flavin-binding Site* , 2013, The Journal of Biological Chemistry.

[34]  Elisenda Feliu,et al.  Understanding protein-protein interactions using local structural features. , 2013, Journal of molecular biology.

[35]  David Baker,et al.  Proof of principle for epitope-focused vaccine design , 2014, Nature.

[36]  U. Sauer,et al.  Coordination of microbial metabolism , 2014, Nature Reviews Microbiology.

[37]  Yang Zhang,et al.  The I-TASSER Suite: protein structure and function prediction , 2014, Nature Methods.

[38]  Ozlem Keskin,et al.  PRISM: a web server and repository for prediction of protein–protein interactions and modeling their 3D complexes , 2014, Nucleic Acids Res..

[39]  David G. Lowe,et al.  Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Jiye Shi,et al.  SAbDab: the structural antibody database , 2013, Nucleic Acids Res..

[41]  Raphael A. G. Chaleil,et al.  Updates to the Integrated Protein-Protein Interaction Benchmarks: Docking Benchmark Version 5 and Affinity Benchmark Version 2. , 2015, Journal of molecular biology.

[42]  Daisuke Kihara,et al.  Large-scale binding ligand prediction by improved patch-based method Patch-Surfer2.0 , 2015, Bioinform..

[43]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[44]  Pierre Vandergheynst,et al.  Geodesic Convolutional Neural Networks on Riemannian Manifolds , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[45]  K. Zak,et al.  Structure of the Complex of Human Programmed Death 1, PD-1, and Its Ligand PD-L1. , 2015, Structure.

[46]  Jie Li,et al.  PDB-wide collection of binding data: current status of the PDBbind database , 2015, Bioinform..

[47]  Michael J E Sternberg,et al.  The Phyre2 web portal for protein modeling, prediction and analysis , 2015, Nature Protocols.

[48]  D. Baker,et al.  The coming of age of de novo protein design , 2016, Nature.

[49]  Jonathan Masci,et al.  Palmprint recognition via discriminative index learning , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[50]  Pierre Vandergheynst,et al.  Geometric Deep Learning: Going beyond Euclidean data , 2016, IEEE Signal Process. Mag..

[51]  Luis Perez,et al.  The Effectiveness of Data Augmentation in Image Classification using Deep Learning , 2017, ArXiv.

[52]  Jonathan Masci,et al.  Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Vladlen Koltun,et al.  Open3D: A Modern Library for 3D Data Processing , 2018, ArXiv.

[54]  Thomas C. Northey,et al.  IntPred: a structure-based predictor of protein–protein interaction sites , 2017, Bioinform..

[55]  Jeffrey W. Martin,et al.  OSPREY 3.0: Open-Source Protein Redesign for You, with Powerful New Features , 2018, bioRxiv.

[56]  U. Sauer,et al.  A Map of Protein-Metabolite Interactions Reveals Principles of Chemical Communication , 2018, Cell.

[57]  Carlo Ferrari,et al.  Antibody interface prediction with 3D Zernike descriptors and SVM , 2018, Bioinform..