A simple bipartite graph projection model for clustering in networks

Graph datasets are frequently constructed by a projection of a bipartite graph, where two nodes are connected in the projection if they share a common neighbor in the bipartite graph; for example, a coauthorship graph is a projection of an author-publication bipartite graph. Analyzing the structure of the projected graph is common, but we do not have a good understanding of the consequences of the projection on such analyses. Here, we propose and analyze a random graph model to study what properties we can expect from the projection step. Our model is based on a Chung-Lu random graph for constructing the bipartite representation, which enables us to rigorously analyze the projected graph. We show that common network properties such as sparsity, heavy-tailed degree distributions, local clustering at nodes, the inverse relationship between node degree, and global transitivity can be explained and analyzed through this simple model. We also develop a fast sampling algorithm for our model, which we show is provably optimal for certain input distributions. Numerical simulations where model parameters come from real-world datasets show that much of the clustering behavior in some datasets can just be explained by the projection step.

[1]  Philip S. Chodrow,et al.  Configuration Models of Random Hypergraphs and their Applications , 2019, J. Complex Networks.

[2]  Annick Lesne,et al.  Beyond ectomycorrhizal bipartite networks: projected networks demonstrate contrasted patterns between early- and late-successional plants in Corsica , 2015, Front. Plant Sci..

[3]  M. Newman,et al.  Random graphs with arbitrary degree distributions and their applications. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  Lada A. Adamic,et al.  Recipe recommendation using ingredient networks , 2011, WebSci '12.

[5]  Joel Nishimura,et al.  Configuring Random Graph Models with Fixed Degree Sequences , 2016, SIAM Rev..

[6]  Jure Leskovec,et al.  Higher-order organization of complex networks , 2016, Science.

[7]  Tatsuya Akutsu,et al.  On the degree distribution of projected networks mapped from bipartite networks , 2011 .

[8]  Daniel B. Larremore,et al.  Efficiently inferring community structure in bipartite networks , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  D. Plenz,et al.  powerlaw: A Python Package for Analysis of Heavy-Tailed Distributions , 2013, PloS one.

[10]  Albert-László Barabási,et al.  Flavor network and the principles of food pairing , 2011, Scientific reports.

[11]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Tamara G. Kolda,et al.  Community structure and scale-free collections of Erdös-Rényi graphs , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  Jure Leskovec,et al.  The dynamics of viral marketing , 2005, EC '06.

[14]  M. Newman,et al.  The structure of scientific collaboration networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[15]  R. Breiger The Duality of Persons and Groups , 1974 .

[16]  Austin R. Benson,et al.  Clustering in graphs and hypergraphs with categorical edge labels , 2020, WWW.

[17]  Brian W. Rogers,et al.  Meeting Strangers and Friends of Friends: How Random are Social Networks? , 2007 .

[18]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[19]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[20]  Casey M. Warmbrand,et al.  A Network Analysis of Committees in the U.S. House of Representatives , 2013, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Jimeng Sun,et al.  Neighborhood formation and anomaly detection in bipartite graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[22]  Mindaugas Bloznelis,et al.  Local probabilities of randomly stopped sums of power-law lattice random variables , 2018, Lithuanian Mathematical Journal.

[23]  S. N. Dorogovtsev,et al.  Evolution of networks , 2001, cond-mat/0106144.

[24]  Jure Leskovec,et al.  Community-Affiliation Graph Model for Overlapping Network Community Detection , 2012, 2012 IEEE 12th International Conference on Data Mining.

[25]  Zachary Neal,et al.  The backbone of bipartite projections: Inferring relationships from co-authorship, co-sponsorship, co-attendance and other co-behaviors , 2014, Soc. Networks.

[26]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[27]  Silvio Lattanzi,et al.  Affiliation networks , 2009, STOC '09.

[28]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[29]  F. Chung,et al.  Connected Components in Random Graphs with Given Expected Degree Sequences , 2002 .

[30]  Aaron Clauset,et al.  Scale-free networks are rare , 2018, Nature Communications.

[31]  Willemien Kets,et al.  RANDOM INTERSECTION GRAPHS WITH TUNABLE DEGREE DISTRIBUTION AND CLUSTERING , 2009, Probability in the Engineering and Informational Sciences.

[32]  Mark S. Granovetter The Strength of Weak Ties , 1973, American Journal of Sociology.

[33]  Roger Guimerà,et al.  Module identification in bipartite and directed networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[34]  F. Chung,et al.  The average distances in random graphs with given expected degrees , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[35]  A. Barabasi,et al.  Hierarchical Organization of Modularity in Metabolic Networks , 2002, Science.

[36]  E. Todeva Networks , 2007 .

[37]  Erhard Godehardt,et al.  Two Models of Random Intersection Graphs for Classification , 2003 .

[38]  Mauricio Tec,et al.  Random Clique Covers for Graphs with Local Density and Global Sparsity , 2018, UAI.

[39]  J. Kertész,et al.  Structural transitions in scale-free networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[40]  Matthieu Latapy,et al.  Main-memory triangle computations for very large (sparse (power-law)) graphs , 2008, Theor. Comput. Sci..

[41]  A. Rapoport Spread of information through a population with socio-structural bias: I. Assumption of transitivity , 1953 .

[42]  Jure Leskovec,et al.  The Local Closure Coefficient: A New Perspective On Network Clustering , 2019, WSDM.

[43]  M. Newman Coauthorship networks and patterns of scientific collaboration , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Tamara G. Kolda,et al.  Measuring and modeling bipartite graphs with community structure , 2016, J. Complex Networks.

[45]  Anatol Rapoport,et al.  Spread of information through a population with socio-structural bias: III. Suggested experimental procedures , 1954 .

[46]  David Barber Clique Matrices for Statistical Graph Decomposition and Parameterising Restricted Positive Definite Matrices , 2008, UAI.

[47]  Austin R. Benson,et al.  Measuring directed triadic closure with closure coefficients , 2019, Network Science.

[48]  Fan Chung Graham,et al.  The Spectra of Random Graphs with Given Expected Degrees , 2004, Internet Math..

[49]  Austin R. Benson,et al.  Localized Flow-Based Clustering in Hypergraphs , 2020, ArXiv.

[50]  Tore Opsahl Triadic closure in two-mode networks: Redefining the global and local clustering coefficients , 2013, Soc. Networks.

[51]  Valentas Kurauskas,et al.  Clustering coefficient of random intersection graphs with infinite degree variance , 2016, 1602.08938.

[52]  Tore Opsahl,et al.  For the few not the many? The effects of affirmative action on presence, prominence, and social capital of women directors in Norway , 2011 .

[53]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[54]  J. Mestres,et al.  Drug‐Target Networks , 2010, Molecular informatics.

[55]  Xiang Fu,et al.  Modeling and Analysis of Tagging Networks in Stack Exchange Communities , 2019, J. Complex Networks.

[56]  Jon M. Kleinberg,et al.  Simplicial closure and higher-order link prediction , 2018, Proceedings of the National Academy of Sciences.

[57]  Mindaugas Bloznelis,et al.  Correlation Between Clustering and Degree in Affiliation Networks , 2017, WAW.

[58]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[59]  Olgica Milenkovic,et al.  Inhomogeneous Hypergraph Clustering with Applications , 2017, NIPS.

[60]  Mindaugas Bloznelis,et al.  Degree and clustering coefficient in sparse random intersection graphs , 2013, 1303.3388.

[61]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[62]  Amanda L. Traud,et al.  Community Structure in Congressional Cosponsorship Networks , 2007, 0708.1191.