Denoising Protein-Protein interaction network via variational graph auto-encoder for protein complex detection

Identifying protein complexes is an important issue in computational biology, as it benefits the understanding of cellular functions and the design of drugs. In the past decades, many computational methods have been proposed by mining dense subgraphs in Protein-Protein Interaction Networks (PINs). However, the high rate of false positive/negative interactions in PINs prevents accurately detecting complexes directly from the raw PINs. In this paper, we propose a denoising approach for protein complex detection by using variational graph auto-encoder. First, we embed a PIN to vector space by a stacked graph convolutional network (GCN), then decide which interactions in the PIN are credible. If the probability of an interaction being credible is less than a threshold, we delete the interaction. In such a way, we reconstruct a reliable PIN. Following that, we detect protein complexes in the reconstructed PIN by using several typical detection methods, including CPM, Coach, DPClus, GraphEntropy, IPCA and MCODE, and compare the results with those obtained directly from the original PIN. We conduct the empirical evaluation on four yeast PPI datasets (Gavin, Krogan, DIP and Wiphi) and two human PPI datasets (Reactome and Reactomekb), against two yeast complex benchmarks (CYC2008 and MIPS) and three human complex benchmarks (REACT, REACT_uniprotkb and CORE_COMPLEX_human), respectively. Experimental results show that with the reconstructed PINs obtained by our denoising approach, complex detection performance can get obviously boosted, in most cases by over 5%, sometimes even by 200%. Furthermore, we compare our approach with two existing denoising methods (RWS and RedNemo) while varying different matching rates on separate complex distributions. Our results show that in most cases (over 2/3), the proposed approach outperforms the existing methods.

[1]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[2]  Shigehiko Kanaya,et al.  Development and implementation of an algorithm for detection of protein complexes in large interaction networks , 2006, BMC Bioinformatics.

[3]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[4]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[5]  Bin Xu,et al.  From Function to Interaction: A New Paradigm for Accurately Predicting Protein Complexes Based on Protein-to-Protein Interaction Networks , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[7]  Siu-Ming Yiu,et al.  Predicting Protein Complexes from PPI Data: A Core-Attachment Approach , 2009, J. Comput. Biol..

[8]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[9]  Hong Yan,et al.  A multi-network clustering method for detecting protein complexes from multiple heterogeneous networks , 2017, BMC Bioinformatics.

[10]  Chun Wang,et al.  MGAE: Marginalized Graph Autoencoder for Graph Clustering , 2017, CIKM.

[11]  S. Pu,et al.  Up-to-date catalogues of yeast protein complexes , 2008, Nucleic acids research.

[12]  Shuigeng Zhou,et al.  CPredictor3.0: detecting protein complexes from PPI networks with expression data and functional annotations , 2017, BMC Systems Biology.

[13]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[14]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[15]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[16]  Young-Rae Cho,et al.  Detecting protein complexes and functional modules from protein interaction networks: A graph entropy approach , 2011 .

[17]  Lina Yao,et al.  Adversarially Regularized Graph Autoencoder , 2018, IJCAI.

[18]  Limsoon Wong,et al.  Using Indirect protein-protein Interactions for protein Complex Prediction , 2008, J. Bioinform. Comput. Biol..

[19]  Jian Wang,et al.  Protein complex detection in PPI networks based on data integration and supervised learning method , 2015, BMC Bioinformatics.

[20]  Jianhua Ruan,et al.  A novel link prediction algorithm for reconstructing protein-protein interaction networks by topological similarity , 2013, Bioinform..

[21]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[22]  Jian Pei,et al.  A Survey on Network Embedding , 2017, IEEE Transactions on Knowledge and Data Engineering.

[23]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2005, Nucleic Acids Res..

[24]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[25]  Limsoon Wong,et al.  From the static interactome to dynamic protein complexes: Three challenges , 2015, J. Bioinform. Comput. Biol..

[26]  Wenwu Zhu,et al.  Structural Deep Network Embedding , 2016, KDD.

[27]  Gang Chen,et al.  Modifying the DPClus algorithm for identifying protein complexes based on new topological structures , 2008, BMC Bioinformatics.

[28]  Min Wu,et al.  A core-attachment based method to detect protein complexes in PPI networks , 2009, BMC Bioinformatics.

[29]  Desmond J. Higham,et al.  Geometric De-noising of Protein-Protein Interaction Networks , 2009, PLoS Comput. Biol..

[30]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Cesim Erten,et al.  RedNemo: topology‐based PPI network reconstruction via repeated diffusion with neighborhood modifications , 2016, Bioinform..

[32]  Efstratios F. Georgopoulos,et al.  Predicting protein complexes from weighted protein-protein interaction graphs with a novel unsupervised methodology: Evolutionary enhanced Markov clustering , 2015, Artif. Intell. Medicine.

[33]  Moataz A. Ahmed,et al.  Protein complexes predictions within protein interaction networks using genetic algorithms , 2016, BMC Bioinformatics.

[34]  Anastasios Bezerianos,et al.  Growing functional modules from a seed protein via integration of protein interaction and gene expression data , 2007, BMC Bioinformatics.

[35]  Yang Wang,et al.  An effective approach to detecting both small and large complexes from protein-protein interaction networks , 2017, BMC Bioinformatics.

[36]  Young-Rae Cho,et al.  Survey: Enhancing protein complex prediction in PPI networks with GO similarity weighting , 2013, Interdisciplinary Sciences: Computational Life Sciences.

[37]  Gianni Cesareni,et al.  WI‐PHI: A weighted yeast interactome enriched for direct physical interactions , 2007, Proteomics.

[38]  Feng Yu,et al.  Predicting protein complex in protein interaction network - a supervised learning based method , 2014, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[39]  Yi Pan,et al.  Construction and application of dynamic protein interaction network based on time course gene expression data , 2013, Proteomics.

[40]  Illés J. Farkas,et al.  CFinder: locating cliques and overlapping modules in biological networks , 2006, Bioinform..

[41]  Cheng-Yu Ma,et al.  Identification of protein complexes by integrating multiple alignment of protein interaction networks , 2017, Bioinform..