Disease-specific protein complex detection in the human protein interaction network with a supervised learning method

High-throughput experimental techniques have produced a large amount of human protein-protein interactions, making it possible to construct a large-scale human PPI network and detect human protein complexes from the network with computational approaches. However, most of current complex detection methods are based on graph theory which can't utilize the information of the known complexes. In this paper, we present a supervised learning method to detect protein complexes in a human PPI network. In this method, biological characteristics and properties of the network are taken into consideration to construct a rich feature set to train a regression model for protein complex detection. In addition, the specific disease related PPIs are extracted from biomedical literatures and then integrated into the original PPI network for detecting the disease-specific protein complexes more effectively. Experimental results show that the performance of our method is superior to other existing state-of-the-art methods. Furthermore, through the analysis of the breast cancer specific complexes detected with our method, more biological insights for breast cancer (e.g., some candidate susceptible genes of breast cancer) are provided.

[1]  M. DePamphilis,et al.  HUMAN DISEASE , 1957, The Ulster Medical Journal.

[2]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[3]  B. Dörken,et al.  Constitutive nuclear factor-kappaB-RelA activation is required for proliferation and survival of Hodgkin's disease tumor cells. , 1997, The Journal of clinical investigation.

[4]  P. Brown,et al.  cJun overexpression in MCF-7 breast cancer cells produces a tumorigenic, invasive and hormone resistant phenotype , 1999, Oncogene.

[5]  B. Séraphin,et al.  A generic protein purification method for protein complex characterization and proteome exploration , 1999, Nature Biotechnology.

[6]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[7]  L. Coignet,et al.  Genomic organization and refined mapping of the human nuclear corepressor 2 (NCOR2)/ silencing mediator of retinoid and thyroid hormone receptor (SMRT) gene on chromosome 12q24.3 , 2001, Cytogenetic and Genome Research.

[8]  F. Rousseau,et al.  Short polyglutamine tracts in the androgen receptor are protective against breast cancer in the general population. , 2001, Cancer research.

[9]  V. Moudgil,et al.  Estrogen-like effects of thyroid hormone on the regulation of tumor suppressor proteins, p53 and retinoblastoma, in breast cancer cells , 2002, Oncogene.

[10]  S. Banu,et al.  Testosterone and estradiol up-regulate androgen and estrogen receptors in immature and adult rat thyroid glands in vivo , 2002, Steroids.

[11]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[12]  O. Okunev,et al.  Detection efficiency of large-active-area NbN single-photon superconducting detectors in the ultraviolet to near-infrared range , 2002 .

[13]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[14]  Stephen J. Elledge,et al.  MDC1 is a mediator of the mammalian DNA damage checkpoint , 2003, Nature.

[15]  Y. Shiloh ATM and related protein kinases: safeguarding genome integrity , 2003, Nature Reviews Cancer.

[16]  B Marshall,et al.  Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource , 2004, Nucleic Acids Res..

[17]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[18]  K. Korach,et al.  Estrogen receptors and human disease. , 2006, The Journal of clinical investigation.

[19]  B. Friedenson,et al.  The BRCA1/2 pathway prevents hematologic cancers in addition to breast and ovarian cancers , 2007, BMC Cancer.

[20]  H. Nevanlinna,et al.  The CHEK2 gene and inherited breast cancer susceptibility , 2006, Oncogene.

[21]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2006, Nucleic Acids Res..

[22]  Jacques van Helden,et al.  Evaluation of clustering algorithms for protein-protein interaction networks , 2006, BMC Bioinformatics.

[23]  Joseph M. Connors,et al.  Genetic Variation in H2AFX Contributes to Risk of Non–Hodgkin Lymphoma , 2007, Cancer Epidemiology Biomarkers & Prevention.

[24]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[25]  M. Olivier,et al.  Impact of mutant p53 functional properties on TP53 mutation patterns and tumor phenotype: lessons from recent developments in the IARC TP53 database , 2007, Human mutation.

[26]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[27]  Yanjun Qi,et al.  Protein complex identification by supervised graph local clustering , 2008, ISMB.

[28]  Stijn van Dongen,et al.  Graph Clustering Via a Discrete Uncoupling Process , 2008, SIAM J. Matrix Anal. Appl..

[29]  Limsoon Wong,et al.  Using Indirect protein-protein Interactions for protein Complex Prediction , 2008, J. Bioinform. Comput. Biol..

[30]  Min Wu,et al.  A core-attachment based method to detect protein complexes in PPI networks , 2009, BMC Bioinformatics.

[31]  Guimei Liu,et al.  Complex discovery from weighted PPI networks , 2009, Bioinform..

[32]  Maria Novatchkova,et al.  MRE11 and COM1/SAE2 are required for double-strand break repair and efficient chromosome pairing during meiosis of the protist Tetrahymena , 2010, Chromosoma.

[33]  A. Barabasi,et al.  Network medicine : a network-based approach to human disease , 2010 .

[34]  Edward R. Dougherty,et al.  Identification of diagnostic subnetwork markers for cancer in human protein-protein interaction network , 2010, BMC Bioinformatics.

[35]  G van der Pluijm,et al.  Smad2 and Smad3 have opposing roles in breast cancer bone metastasis by differentially affecting tumor angiogenesis , 2010, Oncogene.

[36]  Lin Gao,et al.  International Journal of Biological Sciences , 2011 .

[37]  Sandya Liyanarachchi,et al.  Thyroid hormone receptor beta (THRB) is a major target gene for microRNAs deregulated in papillary thyroid carcinoma (PTC). , 2011, The Journal of clinical endocrinology and metabolism.

[38]  Haiyuan Yu,et al.  Detecting overlapping protein complexes in protein-protein interaction networks , 2012, Nature Methods.

[39]  Hongfei Lin,et al.  PPIExtractor: A Protein Interaction Extraction and Visualization System for Biomedical Literature , 2013, IEEE Transactions on NanoBioscience.

[40]  Feng Yu,et al.  Predicting protein complex in protein interaction network - a supervised learning based method , 2014, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[41]  Zhaohui S. Qin,et al.  EgoNet: identification of human disease ego-network modules , 2014, BMC Genomics.

[42]  Matteo Pellegrini,et al.  Detecting Communities Based on Network Topology , 2014, Scientific Reports.

[43]  Jian Wang,et al.  Integrating PPI datasets with the PPI data from biomedical literature for protein complex detection , 2014, BMC Medical Genomics.