Structural Learning of Proteins Using Graph Convolutional Neural Networks

The exponential growth of protein structure databases has motivated the development of efficient deep learning methods that perform structural analysis tasks at large scale, ranging from the classification of experimentally determined proteins to the quality assessment and ranking of computationally generated protein models in the context of protein structure prediction. Yet, the literature discussing these methods does not usually interpret what the models learned from the training or identify specific data attributes that contribute to the classification or regression task. While 3D and 2D CNNs have been widely used to deal with structural data, they have several limitations when applied to structural proteomics data. We pose that graph-based convolutional neural networks (GCNNs) are an efficient alternative while producing results that are interpretable. In this work, we demonstrate the applicability of GCNNs to protein structure classification problems. We define a novel spatial graph convolution network architecture which employs graph reduction methods to reduce the total number of trainable parameters and promote abstraction in interme-diate representations. We show that GCNNs are able to learn effectively from simplistic graph representations of protein structures while providing the ability to interpret what the network learns during the training and how it applies it to perform its task. GCNNs perform comparably to their 2D CNN counterparts in predictive performance and they are outperformed by them in training speeds. The graph-based data representation allows GCNNs to be a more efficient option over 3D CNNs when working with large-scale datasets as preprocessing costs and data storage requirements are negligible in comparison.

[1]  Jacob D. Durrant,et al.  NNScore: A Neural-Network-Based Scoring Function for the Characterization of Protein−Ligand Complexes , 2010, J. Chem. Inf. Model..

[2]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Carla Mattos,et al.  The K-Ras, N-Ras, and H-Ras Isoforms: Unique Conformational Preferences and Implications for Targeting Oncogenic Mutants. , 2018, Cold Spring Harbor perspectives in medicine.

[4]  Gianni De Fabritiis,et al.  DeepSite: protein‐binding site predictor using 3D‐convolutional neural networks , 2017, Bioinform..

[5]  Teruki Honma,et al.  Combining Machine Learning and Pharmacophore-Based Interaction Fingerprint for in Silico Screening , 2010, J. Chem. Inf. Model..

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Pierre Baldi,et al.  Large‐scale prediction of disulphide bridges using kernel methods, two‐dimensional recursive neural networks, and weighted graph matching , 2005, Proteins.

[8]  Izhar Wallach,et al.  AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery , 2015, ArXiv.

[9]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Rasiah Loganantharaj,et al.  A Deep Learning Model for Predicting Tumor Suppressor Genes and Oncogenes from PDB Structure , 2017, bioRxiv.

[11]  Dong Xu,et al.  A sampling-based method for ranking protein structural models by integrating multiple scores and features. , 2011, Current protein & peptide science.

[12]  Jun Li,et al.  RNA3DCNN: Local and global quality assessments of RNA 3D structures using 3D deep convolutional neural networks , 2018, PLoS Comput. Biol..

[13]  Jason Weston,et al.  SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition , 2007, BMC Bioinformatics.

[14]  Pierre Baldi,et al.  A machine learning information retrieval approach to protein fold recognition. , 2006, Bioinformatics.

[15]  Silvia Crivelli,et al.  A Spatial Mapping Algorithm with Applications in Deep Learning-Based Structure Classification , 2018, ArXiv.

[16]  Carla Mattos,et al.  The small GTPases K-Ras, N-Ras, and H-Ras have distinct biochemical properties determined by allosteric effects , 2017, The Journal of Biological Chemistry.

[17]  Yang Zhang,et al.  SPICKER: A clustering approach to identify near‐native protein folds , 2004, J. Comput. Chem..

[18]  John B. O. Mitchell,et al.  A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking , 2010, Bioinform..

[19]  Jacob D. Durrant,et al.  NNScore 2.0: A Neural-Network Receptor–Ligand Scoring Function , 2011, J. Chem. Inf. Model..

[20]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[21]  Nathan D. Cahill,et al.  Robust Spatial Filtering With Graph Convolutional Neural Networks , 2017, IEEE Journal of Selected Topics in Signal Processing.

[22]  Chen Keasar,et al.  Purely Structural Protein Scoring Functions Using Support Vector Machine and Ensemble Learning , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  Gernot Riegler,et al.  OctNet: Learning Deep 3D Representations at High Resolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  David Ryan Koes,et al.  Protein-Ligand Scoring with Convolutional Neural Networks , 2016, Journal of chemical information and modeling.

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Balachandran Manavalan,et al.  Random Forest-Based Protein Model Quality Assessment (RFMQA) Using Structural Features and Potential Energy Terms , 2014, PloS one.

[27]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[28]  Liam J. McGuffin,et al.  The ModFOLD server for the quality assessment of protein structural models , 2008, Bioinform..

[29]  Björn Wallner,et al.  Improved model quality assessment using ProQ2 , 2012, BMC Bioinformatics.

[30]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[31]  Q. Zou,et al.  Recent Progress in Machine Learning-Based Methods for Protein Fold Recognition , 2016, International journal of molecular sciences.

[32]  Jianlin Cheng,et al.  Evaluating the absolute quality of a single protein model using structural features and support vector machines , 2009, Proteins.

[33]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  A. Gorin,et al.  Protein docking using surface matching and supervised machine learning , 2007, Proteins.

[35]  Cengiz Öztireli,et al.  Towards better understanding of gradient-based attribution methods for Deep Neural Networks , 2017, ICLR.

[36]  Arthur M. Lesk,et al.  Introduction to protein architecture : the structural biologyof proteins , 2001 .

[37]  Puteh Saad,et al.  Remote protein homology detection and fold recognition using two-layer support vector machine classifiers , 2011, Comput. Biol. Medicine.

[38]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[39]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[40]  Genki Terashi,et al.  Quality assessment methods for 3D protein structure models based on a residue-residue distance matrix prediction. , 2014, Chemical & pharmaceutical bulletin.

[41]  Alex Fout,et al.  Protein Interface Prediction using Graph Convolutional Networks , 2017, NIPS.

[42]  Pierre Baldi,et al.  Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles , 2002, Proteins.

[43]  P. Manikandan,et al.  Enhanced Artificial Neural Network for Protein Fold Recognition and Structural Class Prediction , 2018, Gene Reports.

[44]  B. Rost,et al.  Improved prediction of protein secondary structure by use of sequence profiles and neural networks. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[45]  Ron O. Dror,et al.  Generalizable Protein Interface Prediction with End-to-End Learning , 2018, ArXiv.

[46]  Jan Ramon,et al.  Predicting Protein Function and Protein-Ligand Interaction with the 3D Neighborhood Kernel , 2015, Discovery Science.

[47]  Andrzej Kloczkowski,et al.  A global machine learning based scoring function for protein structure prediction , 2014, Proteins.

[48]  Michael K. Gilson,et al.  Virtual Screening of Molecular Databases Using a Support Vector Machine , 2005, J. Chem. Inf. Model..

[49]  David Baker,et al.  Ranking predicted protein structures with support vector regression , 2007, Proteins.

[50]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[51]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[52]  Dong Xu,et al.  Protein Structural Model Selection by Combining Consensus and Single Scoring Methods , 2013, PloS one.

[53]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[54]  Huan‐Xiang Zhou,et al.  Prediction of protein interaction sites from sequence profile and residue neighbor list , 2001, Proteins.

[55]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[56]  Zhen Li,et al.  Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model , 2016, bioRxiv.

[57]  Evangelia I. Zacharaki Prediction of protein function using a deep convolutional neural network ensemble (#12536) , 2017 .

[58]  Khaled Rasheed,et al.  Classifying kinase conformations using a machine learning approach , 2017, BMC Bioinformatics.

[59]  Rasiah Loganantharaj,et al.  A Deep Learning Model for Predicting Tumor Suppressor Genes and Oncogenes from PDB Structure , 2017 .

[60]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[61]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[62]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[63]  Nikos Paragios,et al.  A Machine Learning Methodology for Enzyme Functional Classification Combining Structural and Protein Sequence Descriptors , 2016, IWBBIO.

[64]  Jure Leskovec,et al.  Representation Learning on Graphs: Methods and Applications , 2017, IEEE Data Eng. Bull..

[65]  Nikos Paragios,et al.  EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation , 2017, PeerJ.

[66]  Yoshua Bengio,et al.  Deep convolutional networks for quality assessment of protein folds , 2018, Bioinform..

[67]  Jie Hou,et al.  DeepQA: improving the estimation of single protein model quality with deep belief networks , 2016, BMC Bioinformatics.

[68]  Christoph A. Sotriffer,et al.  SFCscoreRF: A Random Forest-Based Scoring Function for Improved Affinity Prediction of Protein-Ligand Complexes , 2013, J. Chem. Inf. Model..

[69]  Leonidas J. Guibas,et al.  Volumetric and Multi-view CNNs for Object Classification on 3D Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).