Deep Neural Network Based Predictions of Protein Interactions Using Primary Sequences

Machine learning based predictions of protein–protein interactions (PPIs) could provide valuable insights into protein functions, disease occurrence, and therapy design on a large scale. The intensive feature engineering in most of these methods makes the prediction task more tedious and trivial. The emerging deep learning technology enabling automatic feature engineering is gaining great success in various fields. However, the over-fitting and generalization of its models are not yet well investigated in most scenarios. Here, we present a deep neural network framework (DNN-PPI) for predicting PPIs using features learned automatically only from protein primary sequences. Within the framework, the sequences of two interacting proteins are sequentially fed into the encoding, embedding, convolution neural network (CNN), and long short-term memory (LSTM) neural network layers. Then, a concatenated vector of the two outputs from the previous layer is wired as the input of the fully connected neural network. Finally, the Adam optimizer is applied to learn the network weights in a back-propagation fashion. The different types of features, including semantic associations between amino acids, position-related sequence segments (motif), and their long- and short-term dependencies, are captured in the embedding, CNN and LSTM layers, respectively. When the model was trained on Pan’s human PPI dataset, it achieved a prediction accuracy of 98.78% at the Matthew’s correlation coefficient (MCC) of 97.57%. The prediction accuracies for six external datasets ranged from 92.80% to 97.89%, making them superior to those achieved with previous methods. When performed on Escherichia coli, Drosophila, and Caenorhabditis elegans datasets, DNN-PPI obtained prediction accuracies of 95.949%, 98.389%, and 98.669%, respectively. The performances in cross-species testing among the four species above coincided in their evolutionary distances. However, when testing Mus Musculus using the models from those species, they all obtained prediction accuracies of over 92.43%, which is difficult to achieve and worthy of note for further study. These results suggest that DNN-PPI has remarkable generalization and is a promising tool for identifying protein interactions.

[1]  Hui Zhang,et al.  Prediction of Protein-Protein Interactions from Amino Acid Sequences Based on Continuous and Discrete Wavelet Transform Features , 2018, Molecules.

[2]  Mark Gerstein,et al.  High-order neural networks and kernel methods for peptide-MHC binding prediction , 2015, Bioinform..

[3]  Yanzhi Guo,et al.  Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences , 2008, Nucleic acids research.

[4]  David K. Gifford,et al.  Convolutional neural network architectures for predicting DNA–protein binding , 2016, Bioinform..

[5]  Q. Zou,et al.  Hierarchical Classification of Protein Folds Using a Novel Ensemble Classifier , 2013, PloS one.

[6]  Shoba Ranganathan,et al.  Protein-protein interactions and prediction: a comprehensive overview. , 2013, Protein and peptide letters.

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Xiujun Gong,et al.  On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach , 2017, PloS one.

[9]  Fei Guo,et al.  Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree , 2017, PloS one.

[10]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[11]  Luhua Lai,et al.  Deep Learning for Drug-Induced Liver Injury , 2015, J. Chem. Inf. Model..

[12]  Jijun Tang,et al.  Identification of Protein–Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information , 2016, International journal of molecular sciences.

[13]  Xingming Zhao,et al.  Predicting protein–protein interactions from protein sequences using meta predictor , 2010, Amino Acids.

[14]  Shuai Li,et al.  A MapReduce based parallel SVM for large-scale predicting protein-protein interactions , 2014, Neurocomputing.

[15]  W J Cui,et al.  Mining topological structures of protein-protein interaction networks for human brain-specific genes. , 2015, Genetics and molecular research : GMR.

[16]  Hongbin Shen,et al.  Large-scale prediction of human protein-protein interactions from amino acid sequence based on latent topic features. , 2010, Journal of proteome research.

[17]  Jie Gui,et al.  Prediction of protein-protein interactions from protein sequence using local descriptors. , 2010, Protein and peptide letters.

[18]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[19]  Zhen Ji,et al.  Prediction of protein-protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set , 2014, BMC Bioinformatics.

[20]  W. Wasserman,et al.  Genome-wide prediction of cis-regulatory regions using supervised deep learning methods , 2016, BMC Bioinformatics.

[21]  Juan de Lara,et al.  Supporting user-oriented analysis for multi-view domain-specific visual languages , 2009, Inf. Softw. Technol..

[22]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[23]  Luhua Lai,et al.  Sequence-based prediction of protein protein interaction using a deep-learning algorithm , 2017, BMC Bioinformatics.

[24]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[25]  Zhu-Hong You,et al.  Predicting Protein-Protein Interactions from Primary Protein Sequences Using a Novel Multi-Scale Local Feature Representation Scheme and the Random Forest , 2015, PloS one.

[26]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[27]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[28]  M. Gerstein,et al.  Global Analysis of Protein Activities Using Proteome Chips , 2001, Science.

[29]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[30]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[31]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[32]  Alex W. Wilkinson,et al.  Computational prediction of protein-protein interactions , 2012 .

[33]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[34]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[35]  Yun Gao,et al.  Prediction of Protein-Protein Interactions Using Local Description of Amino Acid Sequence , 2011 .

[36]  Jie Hou,et al.  DeepSF: deep convolutional neural network for mapping protein sequences to folds , 2017, Bioinform..

[37]  Zachary Chase Lipton A Critical Review of Recurrent Neural Networks for Sequence Learning , 2015, ArXiv.

[38]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[39]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[40]  Jianyang Zeng,et al.  A deep learning framework for modeling structural features of RNA-binding protein targets , 2015, Nucleic acids research.

[41]  Hong-Bin Shen,et al.  RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach , 2016, BMC Bioinformatics.

[42]  Zhu-Hong You,et al.  Using Weighted Sparse Representation Model Combined with Discrete Cosine Transformation to Predict Protein-Protein Interactions from Protein Sequence , 2015, BioMed research international.