Correcting for the study bias associated with protein–protein interaction measurements reveals differences between protein degree distributions from different cancer types

Protein–protein interaction (PPI) networks are associated with multiple types of biases partly rooted in technical limitations of the experimental techniques. Another source of bias are the different frequencies with which proteins have been studied for interaction partners. It is generally believed that proteins with a large number of interaction partners tend to be essential, evolutionarily conserved, and involved in disease. It has been repeatedly reported that proteins driving tumor formation have a higher number of PPI partners. However, it has been noticed before that the degree distribution of PPI networks is biased toward disease proteins, which tend to have been studied more often than non-disease proteins. At the same time, for many poorly characterized proteins no interactions have been reported yet. It is unclear to which extent this study bias affects the observation that cancer proteins tend to have more PPI partners. Here, we show that the degree of a protein is a function of the number of times it has been screened for interaction partners. We present a randomization-based method that controls for this bias to decide whether a group of proteins is associated with significantly more PPI partners than the proteomic background. We apply our method to cancer proteins and observe, in contrast to previous studies, no conclusive evidence for a significantly higher degree distribution associated with cancer proteins as compared to non-cancer proteins when we compare them to proteins that have been equally often studied as bait proteins. Comparing proteins from different tumor types, a more complex picture emerges in which proteins of certain cancer classes have significantly more interaction partners while others are associated with a smaller degree. For example, proteins of several hematological cancers tend to be associated with a higher number of interaction partners as expected by chance. Solid tumors, in contrast, are usually associated with a degree distribution similar to those of equally often studied random protein sets. We discuss the biological implications of these findings. Our work shows that accounting for biases in the PPI network is possible and increases the value of PPI data.

[1]  Jingkai Yu,et al.  Mining breast cancer genes with a network based noise-tolerant approach , 2013, BMC Systems Biology.

[2]  Erich E. Wanker,et al.  Comparison of Human Protein-Protein Interaction Maps , 2007, German Conference on Bioinformatics.

[3]  Ian M. Donaldson,et al.  iRefWeb: interactive analysis of consolidated protein interaction data and their supporting evidence , 2010, Database J. Biol. Databases Curation.

[4]  Jesse Gillis,et al.  The Impact of Multifunctional Genes on "Guilt by Association" Analysis , 2011, PloS one.

[5]  T. Ideker,et al.  Differential network biology , 2012, Molecular systems biology.

[6]  Ralf Herwig,et al.  ConsensusPathDB: toward a more complete picture of cell biology , 2010, Nucleic Acids Res..

[7]  Shinichiro Wachi,et al.  Interactome-transcriptome analysis reveals the high centrality of genes differentially expressed in lung cancer tissues , 2005, Bioinform..

[8]  Bridget E. Begg,et al.  A Proteome-Scale Map of the Human Interactome Network , 2014, Cell.

[9]  Yukiko Matsuoka,et al.  Adding Protein Context to the Human Protein-Protein Interaction Network to Reveal Meaningful Interactions , 2013, PLoS Comput. Biol..

[10]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[11]  John D. Storey,et al.  A network-based analysis of systemic inflammation in humans , 2005, Nature.

[12]  S. Gabriel,et al.  Discovery and saturation analysis of cancer genes across 21 tumor types , 2014, Nature.

[13]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[14]  Martin H. Schaefer,et al.  HIPPIE: Integrating Protein Interaction Networks with Experiment Based Quality Scores , 2012, PloS one.

[15]  Micheline Fromont-Racine,et al.  Ribosome assembly in eukaryotes. , 2003, Gene.

[16]  Gabriele Ausiello,et al.  MINT: the Molecular INTeraction database , 2006, Nucleic Acids Res..

[17]  S. Lovell,et al.  Protein-protein interaction networks and biology—what's the connection? , 2008, Nature Biotechnology.

[18]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[19]  Yongjin Li,et al.  Discovering disease-genes by topological features in human protein-protein interaction network , 2006, Bioinform..

[20]  Arne Elofsson,et al.  Quantitative assessment of the structural bias in protein–protein interaction assays , 2008, Proteomics.

[21]  S. Coulomb,et al.  Gene essentiality and the topology of protein interaction networks , 2005, Proceedings of the Royal Society B: Biological Sciences.

[22]  David L. Robertson,et al.  The biological context of HIV-1 host interactions reveals subtle insights into a system hijack , 2010, BMC Systems Biology.

[23]  J. Reifman,et al.  Influence of Protein Abundance on High-Throughput Protein-Protein Interaction Detection , 2009, PloS one.

[24]  Y. Zhang,et al.  IntAct—open source resource for molecular interaction data , 2006, Nucleic Acids Res..

[25]  Andrea Ciliberto,et al.  Low duplicability and network fragility of cancer genes. , 2008, Trends in genetics : TIG.

[26]  A. Rubin Defective control of ribosomal RNA processing in stimulated leukemic lymphocytes. , 1971, The Journal of clinical investigation.

[27]  Paul A. Bates,et al.  Global topological features of cancer proteins in the human interactome , 2006, Bioinform..

[28]  Sara Ballouz,et al.  Bias tradeoffs in the creation and analysis of protein-protein interaction networks. , 2014, Journal of proteomics.

[29]  A. Barabasi,et al.  An empirical framework for binary interactome mapping , 2008, Nature Methods.

[30]  Peer Bork,et al.  Not Comparable, But Complementary , 2008, Science.

[31]  Dianne P. O'Leary,et al.  Why Do Hubs in the Yeast Protein Interaction Network Tend To Be Essential: Reexamining the Connection between the Network Topology and Essentiality , 2008, PLoS Comput. Biol..

[32]  L. Willems,et al.  In vivo ribosomal RNA turnover is down‐regulated in leukaemic cells in chronic lymphocytic leukaemia , 2010, British journal of haematology.

[33]  V. G. Panse,et al.  Targeted proteomics reveals compositional dynamics of 60S pre-ribosomes after nuclear export , 2012, Molecular systems biology.