A multi-network clustering method for detecting protein complexes from multiple heterogeneous networks

BackgroundThe accurate identification of protein complexes is important for the understanding of cellular organization. Up to now, computational methods for protein complex detection are mostly focus on mining clusters from protein-protein interaction (PPI) networks. However, PPI data collected by high-throughput experimental techniques are known to be quite noisy. It is hard to achieve reliable prediction results by simply applying computational methods on PPI data. Behind protein interactions, there are protein domains that interact with each other. Therefore, based on domain-protein associations, the joint analysis of PPIs and domain-domain interactions (DDI) has the potential to obtain better performance in protein complex detection. As traditional computational methods are designed to detect protein complexes from a single PPI network, it is necessary to design a new algorithm that could effectively utilize the information inherent in multiple heterogeneous networks.ResultsIn this paper, we introduce a novel multi-network clustering algorithm to detect protein complexes from multiple heterogeneous networks. Unlike existing protein complex identification algorithms that focus on the analysis of a single PPI network, our model can jointly exploit the information inherent in PPI and DDI data to achieve more reliable prediction results. Extensive experiment results on real-world data sets demonstrate that our method can predict protein complexes more accurately than other state-of-the-art protein complex identification algorithms.ConclusionsIn this work, we demonstrate that the joint analysis of PPI network and DDI network can help to improve the accuracy of protein complex detection.

[1]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[2]  Le Ou-Yang,et al.  Protein Complex Detection via Weighted Ensemble Clustering Based on Bayesian Nonnegative Matrix Factorization , 2013, PloS one.

[3]  Tatsuya Akutsu,et al.  Prediction of heterotrimeric protein complexes by two-phase learning using neighboring kernels , 2014, BMC Bioinformatics.

[4]  Guimei Liu,et al.  Complex discovery from weighted PPI networks , 2009, Bioinform..

[5]  Lusheng Wang,et al.  Identification of Protein Complexes Using Weighted PageRank-Nibble Algorithm and Core-Attachment Structure , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  Kay Nieselt,et al.  The dynamic architecture of the metabolic switch in Streptomyces coelicolor , 2010, BMC Genomics.

[7]  Haiyuan Yu,et al.  Detecting overlapping protein complexes in protein-protein interaction networks , 2012, Nature Methods.

[8]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2006, Nucleic Acids Res..

[9]  C. Landry,et al.  An in Vivo Map of the Yeast Protein Interactome , 2008, Science.

[10]  S. Pu,et al.  Up-to-date catalogues of yeast protein complexes , 2008, Nucleic acids research.

[11]  Xiaohui Xie,et al.  Inference of the Xenopus tropicalis embryonic regulatory network and spatial gene expression patterns , 2014, BMC Systems Biology.

[12]  Ambuj K. Singh,et al.  RRW: repeated random walks on genome-scale protein networks for local cluster discovery , 2009, BMC Bioinformatics.

[13]  Erkki Oja,et al.  Clustering by Low-Rank Doubly Stochastic Matrix Decomposition , 2012, ICML.

[14]  Sailu Yellaboina,et al.  DOMINE: a comprehensive collection of known and predicted domain-domain interactions , 2010, Nucleic Acids Res..

[15]  Tatsuya Akutsu,et al.  Prediction of Protein-Protein Interaction Strength Using Domain Features with Supervised Regression , 2014, TheScientificWorldJournal.

[16]  Purvesh Khatri,et al.  Onto-Tools: an ensemble of web-accessible, ontology-based tools for the functional design and interpretation of high-throughput gene expression experiments , 2004, Nucleic Acids Res..

[17]  WangJianxin,et al.  Detecting protein complexes based on uncertain graph model , 2014 .

[18]  Xianjun Shen,et al.  Mining Temporal Protein Complex Based on the Dynamic PIN Weighted with Connected Affinity and Gene Co-Expression , 2016, PloS one.

[19]  Robert D. Finn,et al.  iPfam: visualization of protein?Cprotein interactions in PDB at domain and amino acid resolutions , 2005, Bioinform..

[20]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[21]  Xiaomei Quan,et al.  Survey: Functional Module Detection from Protein-Protein Interaction Networks , 2014, IEEE Transactions on Knowledge and Data Engineering.

[22]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[23]  Wei Cheng,et al.  Flexible and robust co-regularized multi-domain graph clustering , 2013, KDD.

[24]  Xiaoli Li,et al.  Computational approaches for detecting protein complexes from protein interaction networks: a survey , 2010, BMC Genomics.

[25]  S. Di Tommaso,et al.  Extensive analysis of D-J-C arrangements allows the identification of different mechanisms enhancing the diversity in sheep T cell receptor β-chain repertoire , 2010, BMC Genomics.

[26]  M. Mitreva,et al.  Alpha-gliadin genes from the A, B, and D genomes of wheat contain different sets of celiac disease epitopes , 2006, BMC Genomics.

[27]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[28]  Limsoon Wong,et al.  Discovery of small protein complexes from PPI networks with size-specific supervised weighting , 2014, BMC Systems Biology.

[29]  Xiaohua Hu,et al.  Neighbor affinity based algorithm for discovering temporal protein complex from dynamic PPI network. , 2016, Methods.

[30]  Jie Zheng,et al.  Identifying protein complexes from heterogeneous biological data , 2013, Proteins.

[31]  Xiao-Fei Zhang,et al.  Detecting Protein Complexes from Signed Protein-Protein Interaction Networks , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[32]  S. Wuchty Topology and weights in a protein domain interaction network – a novel way to predict protein interactions , 2006, BMC Genomics.

[33]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[34]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[35]  David Botstein,et al.  SGD: Saccharomyces Genome Database , 1998, Nucleic Acids Res..

[36]  Hongkang Mei,et al.  Systematic Prediction of Pharmacodynamic Drug-Drug Interactions through Protein-Protein-Interaction Network , 2013, PLoS Comput. Biol..

[37]  Hanghang Tong,et al.  Flexible and Robust Multi-Network Clustering , 2015, KDD.

[38]  T. Ideker,et al.  Integrative approaches for finding modular structure in biological networks , 2013, Nature Reviews Genetics.

[39]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[40]  Vincent Y. F. Tan,et al.  Automatic Relevance Determination in Nonnegative Matrix Factorization with the /spl beta/-Divergence , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Peng Jiang,et al.  SPICi: a fast clustering algorithm for large biological networks , 2010, Bioinform..

[42]  Eivind Hovig,et al.  From proteomes to complexomes in the era of systems biology , 2014, Proteomics.

[43]  Alejandro Panjkovich,et al.  3did Update: domain–domain and peptide-mediated interactions of known 3D structure , 2008, Nucleic Acids Res..

[44]  Yijia Zhang,et al.  Integrating experimental and literature protein-protein interaction data for protein complex prediction , 2015, BMC Genomics.

[45]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[46]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[47]  Peng Yang,et al.  Detecting temporal protein complexes from dynamic protein-protein interaction networks , 2014, BMC Bioinformatics.

[48]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[49]  Illés J. Farkas,et al.  CFinder: locating cliques and overlapping modules in biological networks , 2006, Bioinform..

[50]  Walter Sanseverino,et al.  Use of targeted SNP selection for an improved anchoring of the melon (Cucumis melo L.) scaffold genome assembly , 2015, BMC Genomics.

[51]  Ioannis Xenarios,et al.  DIP: The Database of Interacting Proteins: 2001 update , 2001, Nucleic Acids Res..

[52]  Rui Liu,et al.  Robust Multi-Network Clustering via Joint Cross-Domain Cluster Alignment , 2015, 2015 IEEE International Conference on Data Mining.

[53]  Min Wu,et al.  A core-attachment based method to detect protein complexes in PPI networks , 2009, BMC Bioinformatics.

[54]  Tatsuya Akutsu,et al.  Domain-Based Approaches to Prediction and Analysis of Protein-Protein Interactions , 2014, Int. J. Knowl. Discov. Bioinform..

[55]  Youping Deng,et al.  Recent advances in clustering methods for protein interaction networks , 2010, BMC Genomics.

[56]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[57]  Dao-Qing Dai,et al.  Exploring Overlapping Functional Units with Various Structure in Protein Interaction Networks , 2012, PloS one.

[58]  Chee Keong Kwoh,et al.  Construction of co-complex score matrix for protein complex prediction from AP-MS data , 2011, Bioinform..

[59]  Derek Greene,et al.  Ensemble non-negative matrix factorization methods for clustering protein-protein interactions , 2008, Bioinform..

[60]  Steffen Bickel,et al.  Multi-view clustering , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[61]  Yi Pan,et al.  Detecting Protein Complexes Based on Uncertain Graph Model , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.