Clustering Coefficients in Protein Interaction Hypernetworks

Modeling protein interaction data with graphs (networks) is insufficient for some common types of experimentally generated interaction data. For example, in affinity purification experiments, one protein is pulled out of the cell along with other proteins that are bound to it. This data is not intrinsically binary, so we lose information when we model it with a graph, which can only associate pairs of proteins. Hypergraphs, an extension of graphs which allows relationships among sets of arbitrary size, have been proposed to model this type of data. However, there is no consensus for appropriate measures for these "protein interaction hypernetworks" that are meaningful in both their interpretation and in their correspondence to a biological question (e.g., predicting the function of uncharacterized proteins, identifying new biological modules). The clustering coefficient is a measure commonly used in binary networks for biological insights. While multiple analogs of the clustering coefficient have been proposed for hypernetworks, the usefulness of these for generating biological hypotheses has not been established. We present several new definitions for a hypergraph clustering coefficient that pertain specifically to the biology of interacting proteins. We evaluate the biological meaning of these and previously proposed definitions in protein interaction hypernetworks and test their correlation with protein complexes. We conclude that hypergraph analysis offers important advantages over graph measures for non-binary data, and we discuss the clustering coefficient measures that perform best. Our work suggests a paradigm shift is needed to best gain insights from affinity purification assays and other non-binary data.

[1]  Gary D Bader,et al.  Analyzing yeast protein–protein interaction data obtained from different sources , 2002, Nature Biotechnology.

[2]  Jaques Reifman,et al.  A Novel Scoring Approach for Protein Co-Purification Data Reveals High Interaction Specificity , 2009, PLoS Comput. Biol..

[3]  Insuk Lee,et al.  A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality , 2007, BMC Bioinformatics.

[4]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[5]  Steffen Klamt,et al.  Hypergraphs and Cellular Networks , 2009, PLoS Comput. Biol..

[6]  D. Goldberg,et al.  Assessing experimentally derived interactions in a small world , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Andreas Wagner,et al.  A statistical framework for combining and interpreting proteomic datasets , 2004, Bioinform..

[8]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[9]  Matthieu Latapy,et al.  Basic notions for the analysis of large two-mode networks , 2008, Soc. Networks.

[10]  Nagiza F. Samatova,et al.  From pull-down data to protein interaction networks and complexes with biological relevance. , 2008, Bioinformatics.

[11]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2005, Nucleic Acids Res..

[12]  B. Séraphin,et al.  A generic protein purification method for protein complex characterization and proteome exploration , 1999, Nature Biotechnology.

[13]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[14]  E. Zotenko,et al.  Inferring Physical Protein Contacts from Large-Scale Purification Data of Protein Complexes* , 2011, Molecular & Cellular Proteomics.

[15]  Jorge Peña,et al.  Bipartite Graphs as Models of Population Structures in Evolutionary Multiplayer Games , 2012, PloS one.

[16]  J. A. Rodríguez-Velázquez,et al.  Subgraph centrality and clustering in complex hyper-networks , 2006 .

[17]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[18]  Chee Keong Kwoh,et al.  Construction of co-complex score matrix for protein complex prediction from AP-MS data , 2011, Bioinform..

[19]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.

[20]  Sean R. Collins,et al.  Toward a Comprehensive Atlas of the Physical Interactome of Saccharomyces cerevisiae*S , 2007, Molecular & Cellular Proteomics.

[21]  A. Barabasi,et al.  Hierarchical Organization of Modularity in Metabolic Networks , 2002, Science.

[22]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[23]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[24]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2006, Nucleic Acids Res..

[25]  Jean-Loup Guillaume,et al.  Clustering in P2P Exchanges and Consequences on Performances , 2005, IPTPS.