ProteiNN: Intrinsic-Extrinsic Convolution and Pooling for Scalable Deep Protein Analysis

Proteins perform a large variety of functions in living organisms, thus playing a key role in biology. As of now, available learning algorithms to process protein data do not consider several particularities of such data and/or do not scale well for large protein conformations. To fill this gap, we propose two new learning operations enabling deep 3D analysis of large-scale protein data. First, we introduce a novel convolution operator which considers both, the intrinsic (invariant under protein folding) as well as extrinsic (invariant under bonding) structure, by using $n$-D convolutions defined on both the Euclidean distance, as well as multiple geodesic distances between atoms in a multi-graph. Second, we enable a multi-scale protein analysis by introducing hierarchical pooling operators, exploiting the fact that proteins are a recombination of a finite set of amino acids, which can be pooled using shared pooling matrices. Lastly, we evaluate the accuracy of our algorithms on several large-scale data sets for common protein analysis tasks, where we outperform state-of-the-art methods.

[1]  Andrew Gordon Wilson,et al.  Generalizing Convolutional Neural Networks for Equivariance to Lie Groups on Arbitrary Continuous Data , 2020, ICML.

[2]  Xiao Lin,et al.  Spectral Multigraph Networks for Discovering and Fusing Relationships in Molecules , 2018, ArXiv.

[3]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[4]  Osamu Watanabe,et al.  Generalized Shortest Path Kernel on Graphs , 2015, Discovery Science.

[5]  Frederik Diehl,et al.  Edge Contraction Pooling for Graph Neural Networks , 2019, ArXiv.

[6]  P. Dobson,et al.  Distinguishing enzyme structures from non-enzymes without alignments. , 2003, Journal of molecular biology.

[7]  Martin Grohe,et al.  Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks , 2018, AAAI.

[8]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[9]  Yingyu Liang,et al.  N-Gram Graph: Simple Unsupervised Representation for Graphs, with Applications to Molecules , 2018, NeurIPS.

[10]  Maria Jesus Martin,et al.  SIFTS: updated Structure Integration with Function, Taxonomy and Sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins , 2018, Nucleic Acids Res..

[11]  Regina Barzilay,et al.  Generative Models for Graph-Based Protein Design , 2019, DGS@ICLR.

[12]  Daniel C. Elton,et al.  Deep learning for molecular generation and optimization - a review of the state of the art , 2019, Molecular Systems Design & Engineering.

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  Maxat Kulmanov,et al.  DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier , 2017, Bioinform..

[15]  Jaewoo Kang,et al.  Self-Attention Graph Pooling , 2019, ICML.

[16]  Maxat Kulmanov,et al.  DeepGOPlus: Improved protein function prediction from sequence , 2019 .

[17]  Rishi Bedi,et al.  End-to-End Learning on 3D Protein Structure for Interface Prediction , 2019, NeurIPS.

[18]  Timo Ropinski,et al.  Monte Carlo convolution for learning on non-uniformly sampled point clouds , 2018, ACM Trans. Graph..

[19]  Nikos Paragios,et al.  EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation , 2017, PeerJ.

[20]  Gianni De Fabritiis,et al.  DeepSite: protein‐binding site predictor using 3D‐convolutional neural networks , 2017, Bioinform..

[21]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[22]  Jure Leskovec,et al.  Position-aware Graph Neural Networks , 2019, ICML.

[23]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[24]  Mathias Niepert,et al.  Learning Convolutional Neural Networks for Graphs , 2016, ICML.

[25]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[27]  Leonidas J. Guibas,et al.  KPConv: Flexible and Deformable Convolution for Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Jun Sese,et al.  Compound‐protein interaction prediction with end‐to‐end learning of neural networks for graphs and sequences , 2018, Bioinform..

[29]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[30]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[31]  Yoshua Bengio,et al.  Deep convolutional networks for quality assessment of protein folds , 2018, Bioinform..

[32]  Vijay S. Pande,et al.  Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[33]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[34]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[35]  Charu C. Aggarwal,et al.  Graph Convolutional Networks with EigenPooling , 2019, KDD.

[36]  Hae-Sang Park,et al.  A simple and fast algorithm for K-medoids clustering , 2009, Expert Syst. Appl..

[37]  Jure Leskovec,et al.  Hierarchical Graph Representation Learning with Differentiable Pooling , 2018, NeurIPS.

[38]  Dongdong Chen,et al.  Quantum-based subgraph convolutional neural networks , 2019, Pattern Recognit..

[39]  Stephan Günnemann,et al.  Directional Message Passing for Molecular Graphs , 2020, ICLR.

[40]  Samuel Karlin,et al.  Protein length in eukaryotic and prokaryotic proteomes , 2005, Nucleic acids research.

[41]  Yaron Lipman,et al.  Point convolutional neural networks by extension operators , 2018, ACM Trans. Graph..

[42]  Hans-Peter Kriegel,et al.  Protein function prediction via graph kernels , 2005, ISMB.

[43]  Hugues Hoppe,et al.  Progressive meshes , 1996, SIGGRAPH.

[44]  David Ryan Koes,et al.  Protein-Ligand Scoring with Convolutional Neural Networks , 2016, Journal of chemical information and modeling.

[45]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[47]  Patrick Wieschollek,et al.  Flex-Convolution - Million-Scale Point-Cloud Learning Beyond Grid-Worlds , 2018, ACCV.

[48]  Joan Bruna,et al.  Deep Convolutional Networks on Graph-Structured Data , 2015, ArXiv.

[49]  Shuiwang Ji,et al.  Graph U-Nets , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[51]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.