Surface ID: a geometry-aware system for protein molecular surface comparison

Abstract Motivation A protein can be represented in several forms, including its 1D sequence, 3D atom coordinates, and molecular surface. A protein surface contains rich structural and chemical features directly related to the protein’s function such as its ability to interact with other molecules. While many methods have been developed for comparing the similarity of proteins using the sequence and structural representations, computational methods based on molecular surface representation are limited. Results Here, we describe “Surface ID,” a geometric deep learning system for high-throughput surface comparison based on geometric and chemical features. Surface ID offers a novel grouping and alignment algorithm useful for clustering proteins by function, visualization, and in silico screening of potential binding partners to a target molecule. Our method demonstrates top performance in surface similarity assessment, indicating great potential for protein functional annotation, a major need in protein engineering and therapeutic design. Availability and implementation Source code for the Surface ID model, trained weights, and inference script are available at https://github.com/Sanofi-Public/LMR-SurfaceID.

[1]  O. S.,et al.  Accurate prediction of protein structures and interactions using a three-track neural network , 2022, Yearbook of Paediatric Endocrinology.

[2]  L. Holm Dali server: structural unification of protein families , 2022, Nucleic Acids Res..

[3]  Samer F. Halabiya,et al.  Design of protein-binding proteins from the target structure alone , 2022, Nature.

[4]  D. Hassabis,et al.  AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models , 2021, Nucleic Acids Res..

[5]  T. Jaakkola,et al.  Antibody-Antigen Docking and Design via Hierarchical Structure Refinement , 2022, ICML.

[6]  Oriol Vinyals,et al.  Highly accurate protein structure prediction with AlphaFold , 2021, Nature.

[7]  J. Glaunès,et al.  Kernel Operations on the GPU, with Autodiff, without Memory Overflows , 2020, J. Mach. Learn. Res..

[8]  Da Zhang,et al.  Protein Family Classification from Scratch: A CNN Based Deep Learning Approach , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  Myle Ott,et al.  Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences , 2019, Proceedings of the National Academy of Sciences.

[10]  M. Bronstein,et al.  Fast end-to-end learning on protein surfaces , 2020, bioRxiv.

[11]  Charlotte M. Deane,et al.  CoV-AbDab: the Coronavirus Antibody Database , 2020, bioRxiv.

[12]  M. Bronstein,et al.  Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning , 2019, Nature Methods.

[13]  G. Gilliland,et al.  Antibody Structure and Function: The Basis for Engineering Therapeutics , 2019, Antibodies.

[14]  Jacques Chomilier,et al.  Protein multiple alignments: sequence-based versus structure-based programs , 2019, Bioinform..

[15]  George M. Church,et al.  Unified rational protein engineering with sequence-only deep representation learning , 2019, bioRxiv.

[16]  K. Liedl,et al.  Characterizing the Diversity of the CDR-H3 Loop Conformational Ensembles in Relationship to Antibody Binding Properties , 2019, Front. Immunol..

[17]  Carlo Ferrari,et al.  Antibody interface prediction with 3D Zernike descriptors and SVM , 2018, Bioinform..

[18]  Yunpeng Cai,et al.  A benchmark study of sequence alignment methods for protein clustering , 2018, BMC Bioinformatics.

[19]  R. Kolodny,et al.  A Novel Geometry-Based Approach to Infer Protein Interface Similarity , 2018, Scientific Reports.

[20]  Brian D. Weitzner,et al.  RosettaAntibodyDesign (RAbD): A general framework for computational antibody design , 2017, bioRxiv.

[21]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[22]  Cédric Notredame,et al.  Multiple sequence alignment modeling: methods and applications , 2016, Briefings Bioinform..

[23]  Charlotte M. Deane,et al.  ANARCI: antigen receptor numbering and receptor classification , 2015, Bioinform..

[24]  Daisuke Kihara,et al.  Large-scale binding ligand prediction by improved patch-based method Patch-Surfer2.0 , 2015, Bioinform..

[25]  Jiye Shi,et al.  SAbDab: the structural antibody database , 2013, Nucleic Acids Res..

[26]  Jianzhu Ma,et al.  Algorithms, applications, and challenges of protein structure alignment. , 2014, Advances in protein chemistry and structural biology.

[27]  Yan Yuan Tseng,et al.  Classification of protein functional surfaces using structural characteristics , 2012, Proceedings of the National Academy of Sciences.

[28]  Philippe Derreumaux,et al.  Flexibility and binding affinity in protein–ligand, protein–protein and multi-component protein interactions: limitations of current computational approaches , 2012, Journal of The Royal Society Interface.

[29]  Daisuke Kihara,et al.  Molecular surface representation using 3D Zernike descriptors for protein shape comparison and docking. , 2011, Current protein & peptide science.

[30]  Daisuke Kihara,et al.  Protein-protein docking using region-based 3D Zernike descriptors , 2009, BMC Bioinformatics.

[31]  Shuangye Yin,et al.  Fast screening of protein surfaces using geometric invariant fingerprints , 2009, Proceedings of the National Academy of Sciences.

[32]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[33]  N. Gold,et al.  Fold independent structural comparisons of protein-ligand binding sites for exploring functional relationships. , 2006, Journal of molecular biology.

[34]  Patrice Duroux,et al.  IMGT/LIGM-DB, the IMGT® comprehensive database of immunoglobulin and T cell receptor nucleotide sequences , 2005, Nucleic Acids Res..

[35]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[36]  Janet M. Thornton,et al.  Real spherical harmonic expansion coefficients as 3D shape descriptors for protein binding pocket and ligand comparisons , 2005, Bioinform..

[37]  R. Jackson,et al.  Towards a structural classification of phosphate binding sites in protein–nucleotide complexes: An automated all‐against‐all structural comparison using geometric matching , 2004, Proteins.

[38]  Ruth Nussinov,et al.  A method for simultaneous alignment of multiple protein structures , 2004, Proteins.

[39]  H. Wolfson,et al.  Recognition of Functional Sites in Protein Structures☆ , 2004, Journal of Molecular Biology.

[40]  Antonio Alcami,et al.  Viral mimicry of cytokines, chemokines and their receptors , 2003, Nature Reviews Immunology.

[41]  J. Thornton,et al.  Tess: A geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases. Application to enzyme active sites , 1997, Protein science : a publication of the Protein Society.

[42]  R. Nussinov,et al.  Molecular recognition via face center representation of a molecular surface. , 1996, Journal of Molecular Graphics.

[43]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[44]  H. Wolfson,et al.  Efficient detection of three-dimensional structural motifs in biological macromolecules by computer vision techniques. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[45]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[46]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.