Protein complexes prediction via positive and unlabeled learning of the PPI networks

Protein complex (complex for short), is a set of proteins that interact with each other for specific biological activities. The core idea of traditional unsupervised clustering methods is finding dense subgraphs from the protein-protein interaction (PPI) network. In fact, some complexes are not dense in the network. Supervised clustering methods regard known complexes as positive cases and unknown complexes as negative cases, attempting to discover the sparse complexes hidden in the network. Unknown complex subgraphs contain many undetected complexes. Those undetected positive complexes are learned as negative cases, which affects the performance of supervised learning seriously. Therefore, supervised clustering methods are faced with the problem of PU (Positive Unlabeled), which contains only the positive cases. Complex prediction not only needs to consider the establishment of PU learning model, but also involves how to cluster. On top of this, this paper considers 22 attributes of the complex, such as the density of subgraphs, topological coefficients, the weights of edges and so on. We proposed an approach of complex prediction based on PU learning to mine complexes which cannot be found by using traditional approaches. Experiments show that our method has a higher accuracy than the traditional approaches, e.g., CFinder, CMC, MCODE and AP.

[1]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[2]  Sean R. Collins,et al.  Toward a Comprehensive Atlas of the Physical Interactome of Saccharomyces cerevisiae*S , 2007, Molecular & Cellular Proteomics.

[3]  Jean-Philippe Vert,et al.  ProDiGe: Prioritization Of Disease Genes with multitask machine learning from positive and unlabeled examples , 2011, BMC Bioinformatics.

[4]  Zong Dai,et al.  Identification of human protein complexes from local sub-graphs of protein-protein interaction network based on random forest with topological structure features. , 2012, Analytica chimica acta.

[5]  Xiaoli Li,et al.  Computational approaches for detecting protein complexes from protein interaction networks: a survey , 2010, BMC Genomics.

[6]  Feng Yu,et al.  Predicting protein complex in protein interaction network - a supervised learning based method , 2014, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[7]  Osamu Maruyama,et al.  Heterodimeric protein complex identification by naïve Bayes classifiers , 2013, BMC Bioinformatics.

[8]  Pedro Larrañaga,et al.  Learning Bayesian classifiers from positive and unlabeled examples , 2007, Pattern Recognit. Lett..

[9]  Hunter B. Fraser,et al.  Using protein complexes to predict phenotypic effects of gene mutation , 2007, Genome Biology.

[10]  Antonino Fiannaca,et al.  A knowledge-based decision support system in bioinformatics: an application to protein complex extraction , 2013, BMC Bioinformatics.

[11]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[12]  Chun-Nan Hsu,et al.  Identification of homologous microRNAs in 56 animal genomes. , 2010, Genomics.

[13]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2005, Nucleic Acids Res..

[14]  Illés J. Farkas,et al.  CFinder: locating cliques and overlapping modules in biological networks , 2006, Bioinform..

[15]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[16]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[17]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[18]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[19]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[20]  Xing-Ming Zhao,et al.  Gene function prediction using labeled and unlabeled data , 2008, BMC Bioinformatics.

[21]  Rémi Gilleron,et al.  Learning from positive and unlabeled examples , 2000, Theor. Comput. Sci..

[22]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[23]  Pall I. Olason,et al.  A human phenome-interactome network of protein complexes implicated in genetic disorders , 2007, Nature Biotechnology.

[24]  Chee Keong Kwoh,et al.  Positive-unlabeled learning for disease gene identification , 2012, Bioinform..

[25]  Mehmet Tan,et al.  Improving Positive Unlabeled Learning Algorithms for Protein Interaction Prediction , 2014, PACBB.

[26]  Xiaoli Li,et al.  Inferring Gene-Phenotype Associations via Global Protein Complex Network Propagation , 2011, PloS one.

[27]  Jacques van Helden,et al.  Evaluation of clustering algorithms for protein-protein interaction networks , 2006, BMC Bioinformatics.

[28]  Xiaoli Li,et al.  Ensemble Positive Unlabeled Learning for Disease Gene Identification , 2014, PloS one.

[29]  Mehmet Tan,et al.  Positive unlabeled learning for deriving protein interaction networks , 2012, Network Modeling Analysis in Health Informatics and Bioinformatics.

[30]  Charles Elkan,et al.  Learning gene regulatory networks from only positive and unlabeled data , 2010, BMC Bioinformatics.

[31]  Yanjun Qi,et al.  Protein complex identification by supervised graph local clustering , 2008, ISMB.

[32]  Guimei Liu,et al.  Complex discovery from weighted PPI networks , 2009, Bioinform..

[33]  Haiyuan Yu,et al.  Detecting overlapping protein complexes in protein-protein interaction networks , 2012, Nature Methods.

[34]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.