Graph-based prediction of Protein-protein interactions with attributed signed graph embedding

Background Protein-protein interactions (PPIs) are central to many biological processes. Considering that the experimental methods for identifying PPIs are time-consuming and expensive, it is important to develop automated computational methods to better predict PPIs. Various machine learning methods have been proposed, including a deep learning technique which is sequence-based that has achieved promising results. However, it only focuses on sequence information while ignoring the structural information of PPI networks. Structural information of PPI networks such as their degree, position, and neighboring nodes in a graph has been proved to be informative in PPI prediction. Results Facing the challenge of representing graph information, we introduce an improved graph representation learning method. Our model can study PPI prediction based on both sequence information and graph structure. Moreover, our study takes advantage of a representation learning model and employs a graph-based deep learning method for PPI prediction, which shows superiority over existing sequence-based methods. Statistically, Our method achieves state-of-the-art accuracy of 99.15% on Human protein reference database (HPRD) dataset and also obtains best results on Database of Interacting Protein (DIP) Human, Drosophila , Escherichia coli ( E. coli ), and Caenorhabditis elegans ( C. elegan ) datasets. Conclusion Here, we introduce signed variational graph auto-encoder (S-VGAE), an improved graph representation learning method, to automatically learn to encode graph structure into low-dimensional embeddings. Experimental results demonstrate that our method outperforms other existing sequence-based methods on several datasets. We also prove the robustness of our model for very sparse networks and the generalization for a new dataset that consists of four datasets: HPRD, E.coli , C.elegan , and Drosophila .

[1]  Magdalena Foltman,et al.  Studying Protein-Protein Interactions in Budding Yeast Using Co-immunoprecipitation. , 2016, Methods in molecular biology.

[2]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[3]  Zhu-Hong You,et al.  Predicting Protein-Protein Interactions from Primary Protein Sequences Using a Novel Multi-Scale Local Feature Representation Scheme and the Random Forest , 2015, PloS one.

[4]  Menglong Li,et al.  PRED_PPI: a server for predicting protein-protein interactions based on sequence data with probability assignment , 2010, BMC Research Notes.

[5]  Toshihisa Takagi,et al.  Improving the Performance of an SVM-Based Method for Predicting Protein-Protein Interactions , 2006, Silico Biol..

[6]  Florian Richoux,et al.  Comparing two deep learning sequence-based models for protein-protein interaction prediction , 2019, ArXiv.

[7]  Hareton K. N. Leung,et al.  A Highly Efficient Approach to Protein Interactome Mapping Based on Collaborative Filtering Framework , 2015, Scientific Reports.

[8]  Luhua Lai,et al.  Sequence-based prediction of protein protein interaction using a deep-learning algorithm , 2017, BMC Bioinformatics.

[9]  Feng Liu,et al.  Deep Learning and Its Applications in Biomedicine , 2018, Genom. Proteom. Bioinform..

[10]  Sara Linse,et al.  Methods for the detection and analysis of protein–protein interactions , 2007, Proteomics.

[11]  Doina Caragea,et al.  Structural Prediction of Protein-Protein Interactions in Saccharomyces cerevisiae , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[12]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[13]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Max Welling,et al.  Variational Graph Auto-Encoders , 2016, ArXiv.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[17]  Haijun Lei,et al.  Protein–Protein Interactions Prediction via Multimodal Deep Polynomial Network and Regularized Extreme Learning Machine , 2019, IEEE Journal of Biomedical and Health Informatics.

[18]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[19]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[20]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[21]  Jing Tang,et al.  DrugComb: an integrative cancer drug combination data portal , 2019, Nucleic Acids Res..

[22]  Chao Feng,et al.  Assaying RNA structure with LASER-Seq , 2018, Nucleic acids research.

[23]  Noémie Elhadad,et al.  Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategies , 2013, BMC Bioinformatics.

[24]  Renu Vyas,et al.  Building and analysis of protein-protein interactions related to diabetes mellitus using support vector machine, biomedical text mining and network analysis , 2016, Comput. Biol. Chem..

[25]  M. Vidal,et al.  Effect of sampling on topology predictions of protein-protein interaction networks , 2005, Nature Biotechnology.

[26]  Hongbin Shen,et al.  Large-scale prediction of human protein-protein interactions from amino acid sequence based on latent topic features. , 2010, Journal of proteome research.

[27]  Zhen Ji,et al.  Large-Scale Protein-Protein Interactions Detection by Integrating Big Biosensing Data with Computational Model , 2014, BioMed research international.

[28]  Sophie Alvarez,et al.  Data on the identification of protein interactors with the Evening Complex and PCH1 in Arabidopsis using tandem affinity purification and mass spectrometry (TAP–MS) , 2016, Data in brief.

[29]  Yu Yao,et al.  DeepPPI: Boosting Prediction of Protein-Protein Interactions with Deep Neural Networks , 2017, J. Chem. Inf. Model..

[30]  Hong-Bin Shen,et al.  Adaptive compressive learning for prediction of protein-protein interactions from primary sequence. , 2011, Journal of theoretical biology.

[31]  Zhu-Hong You,et al.  Detecting Protein-Protein Interactions with a Novel Matrix-Based Protein Sequence Representation and Support Vector Machines , 2015, BioMed research international.

[32]  Max Welling,et al.  Graph Convolutional Matrix Completion , 2017, ArXiv.

[33]  Paloma Martínez,et al.  SemEval-2013 Task 9 : Extraction of Drug-Drug Interactions from Biomedical Texts (DDIExtraction 2013) , 2013, *SEMEVAL.

[34]  Xue-wen Chen,et al.  Heterogeneous data integration by tree‐augmented naïve Bayes for protein–protein interactions prediction , 2013, Proteomics.

[35]  Long Zhang,et al.  Protein-protein interactions prediction based on ensemble deep neural networks , 2019, Neurocomputing.

[36]  Behnam Neyshabur,et al.  Predicting protein‐protein interactions through sequence‐based deep learning , 2018, Bioinform..

[37]  Zhu-Hong You,et al.  Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data , 2010, Bioinform..

[38]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[39]  Zhu-Hong You,et al.  Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis , 2013, BMC Bioinformatics.

[40]  Kui Zhang,et al.  Prediction of protein function using protein-protein interaction data , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[41]  Luonan Chen,et al.  Analysis on multi-domain cooperation for predicting protein-protein interactions , 2007, BMC Bioinformatics.

[42]  Mei Liu,et al.  Prediction of protein-protein interactions using random decision forest framework , 2005, Bioinform..

[43]  Jure Leskovec,et al.  Representation Learning on Graphs: Methods and Applications , 2017, IEEE Data Eng. Bull..

[44]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[45]  Olwyn Byron,et al.  Protein-protein interactions: a supra-structural phenomenon demanding trans-disciplinary biophysical approaches. , 2015, Current opinion in structural biology.

[46]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[47]  William Stafford Noble,et al.  Large-scale prediction of protein-protein interactions from structures , 2010, BMC Bioinformatics.

[48]  Carlo Zaniolo,et al.  Multifaceted protein–protein interaction prediction based on Siamese residual RCNN , 2019, Bioinform..

[49]  Huiru Zheng,et al.  Supervised Statistical and Machine Learning Approaches to Inferring Pairwise and Module-Based Protein Interaction Networks , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[50]  Lise Getoor,et al.  Predicting Protein-Protein Interactions Using Relational Features , 2007 .