Detection of Protein Complexes Based on Penalized Matrix Decomposition in a Sparse Protein–Protein Interaction Network

High-throughput technology has generated large-scale protein interaction data, which is crucial in our understanding of biological organisms. Many complex identification algorithms have been developed to determine protein complexes. However, these methods are only suitable for dense protein interaction networks, because their capabilities decrease rapidly when applied to sparse protein–protein interaction (PPI) networks. In this study, based on penalized matrix decomposition (PMD), a novel method of penalized matrix decomposition for the identification of protein complexes (i.e., PMDpc) was developed to detect protein complexes in the human protein interaction network. This method mainly consists of three steps. First, the adjacent matrix of the protein interaction network is normalized. Second, the normalized matrix is decomposed into three factor matrices. The PMDpc method can detect protein complexes in sparse PPI networks by imposing appropriate constraints on factor matrices. Finally, the results of our method are compared with those of other methods in human PPI network. Experimental results show that our method can not only outperform classical algorithms, such as CFinder, ClusterONE, RRW, HC-PIN, and PCE-FR, but can also achieve an ideal overall performance in terms of a composite score consisting of F-measure, accuracy (ACC), and the maximum matching ratio (MMR).

[1]  Simon C. K. Shiu,et al.  Molecular Pattern Discovery Based on Penalized Matrix Decomposition , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[4]  Zhu-Hong You,et al.  Identifying Spurious Interactions in the Protein-Protein Interaction Networks Using Local Similarity Preserving Embedding , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  Juan Liu,et al.  A novel computational framework for simultaneous integration of multiple types of genomic data to identify microRNA-gene regulatory modules , 2011, Bioinform..

[6]  Joel S. Bader,et al.  NeMo: Network Module identification in Cytoscape , 2010, BMC Bioinformatics.

[7]  Shang-Hua Teng,et al.  Finding local communities in protein networks , 2009, BMC Bioinformatics.

[8]  Yi Pan,et al.  A Fast Hierarchical Clustering Algorithm for Functional Modules Discovery in Protein Interaction Networks , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  Yitzhak Pilpel,et al.  Global and Local Architecture of the Mammalian microRNA–Transcription Factor Regulatory Network , 2007, PLoS Comput. Biol..

[10]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[11]  Jian-Xun Mi,et al.  A Class-Information-Based Penalized Matrix Decomposition for Identifying Plants Core Genes Responding to Abiotic Stresses , 2014, PloS one.

[12]  F. Zare-Mirakabad,et al.  WCOACH: Protein complex prediction in weighted PPI networks. , 2015, Genes & genetic systems.

[13]  Lin Gao,et al.  Detecting Overlapping Protein Complexes by Rough-Fuzzy Clustering in Protein-Protein Interaction Networks , 2014, PloS one.

[14]  Cheng Liang,et al.  PCE-FR: A Novel Method for Identifying Overlapping Protein Complexes in Weighted Protein-Protein Interaction Networks Using Pseudo-Clique Extension Based on Fuzzy Relation , 2016, IEEE Transactions on NanoBioscience.

[15]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[16]  Xiaoli Li,et al.  Inferring Gene-Phenotype Associations via Global Protein Complex Network Propagation , 2011, PloS one.

[17]  Keith C. C. Chan,et al.  A density-based clustering approach for identifying overlapping protein complexes with functional preferences , 2015, BMC Bioinformatics.

[18]  Ambuj K. Singh,et al.  RRW: repeated random walks on genome-scale protein networks for local cluster discovery , 2009, BMC Bioinformatics.

[19]  Clara Pizzuti,et al.  A Coclustering Approach for Mining Large Protein-Protein Interaction Networks , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[20]  Guimei Liu,et al.  Complex discovery from weighted PPI networks , 2009, Bioinform..

[21]  Haiyuan Yu,et al.  Detecting overlapping protein complexes in protein-protein interaction networks , 2012, Nature Methods.

[22]  Illés J. Farkas,et al.  CFinder: locating cliques and overlapping modules in biological networks , 2006, Bioinform..

[23]  Min Wu,et al.  A core-attachment based method to detect protein complexes in PPI networks , 2009, BMC Bioinformatics.

[24]  Xiaoli Li,et al.  Benchmarking Human Protein Complexes to Investigate Drug-Related Systems and Evaluate Predicted Protein Complexes , 2013, PloS one.

[25]  Kahn Rhrissorrakrai,et al.  MINE: Module Identification in Networks , 2011, BMC Bioinformatics.

[26]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[27]  Jia Song,et al.  Clustering Algorithms for Detecting Functional Modules in protein Interaction Networks , 2009, J. Bioinform. Comput. Biol..

[28]  Peng Jiang,et al.  SPICi: a fast clustering algorithm for large biological networks , 2010, Bioinform..

[29]  Yuan Zhang,et al.  A collective NMF method for detecting protein functional module from multiple data sources , 2012, BCB.

[30]  Cheng Liang,et al.  MOEPGA: A novel method to detect protein complexes in yeast protein-protein interaction networks based on MultiObjective Evolutionary Programming Genetic Algorithm , 2015, Comput. Biol. Chem..

[31]  Shigehiko Kanaya,et al.  Development and implementation of an algorithm for detection of protein complexes in large interaction networks , 2006, BMC Bioinformatics.

[32]  Dipanwita Roy Chowdhury,et al.  Human protein reference database as a discovery resource for proteomics , 2004, Nucleic Acids Res..

[33]  Cheng Liang,et al.  Identifying Protein Complexes by Combining Network Topology and Biological Characteristics , 2016 .

[34]  Paul Tempst,et al.  PINdb: a database of nuclear protein complexes from human and yeast , 2004, Bioinform..

[35]  Ujjwal Maulik,et al.  Mining Quasi-Bicliques from HIV-1-Human Protein Interaction Network: A Multiobjective Biclustering Approach , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[36]  Giancarlo Mauri,et al.  MTGO: PPI Network Analysis Via Topological and Functional Module Identification , 2018, Scientific Reports.