Edge Exchangeable Models for Interaction Networks

ABSTRACT Many modern network datasets arise from processes of interactions in a population, such as phone calls, email exchanges, co-authorships, and professional collaborations. In such interaction networks, the edges comprise the fundamental statistical units, making a framework for edge-labeled networks more appropriate for statistical analysis. In this context, we initiate the study of edge exchangeable network models and explore its basic statistical properties. Several theoretical and practical features make edge exchangeable models better suited to many applications in network analysis than more common vertex-centric approaches. In particular, edge exchangeable models allow for sparse structure and power law degree distributions, both of which are widely observed empirical properties that cannot be handled naturally by more conventional approaches. Our discussion culminates in the Hollywood model, which we identify here as the canonical family of edge exchangeable distributions. The Hollywood model is computationally tractable, admits a clear interpretation, exhibits good theoretical properties, and performs reasonably well in estimation and prediction as we demonstrate on real network datasets. As a generalization of the Hollywood model, we further identify the vertex components model as a nonparametric subclass of models with a convenient stick breaking construction.

[1]  Algernon West The Right Hon. Lord Welby, G. C. B. An Appreciation , 1916 .

[2]  W. Ewens The sampling theory of selectively neutral alleles. , 1972, Theoretical population biology.

[3]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[4]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[5]  D. Aldous Representations for partially exchangeable arrays of random variables , 1981 .

[6]  P. Holland,et al.  An Exponential Family of Probability Distributions for Directed Graphs , 1981 .

[7]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[8]  P. Newbold,et al.  Estimation and Prediction , 1985 .

[9]  O. Kallenberg Exchangeable random measures in the plane , 1990 .

[10]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[11]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure , 1997 .

[12]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[13]  Edward R. Scheinerman,et al.  On Random Intersection Graphs: The Subgraph Problem , 1999, Combinatorics, Probability and Computing.

[14]  James Allen Fill,et al.  Random intersection graphs when m= w (n): an equivalence theorem relating the evolution of the G ( n, m, p ) and G ( n,P /italic>) models , 2000 .

[15]  James Allen Fill,et al.  Random intersection graphs when m=omega(n): An equivalence theorem relating the evolution of the G(n, m, p) and G(n, p) models , 2000, Random Struct. Algorithms.

[16]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[17]  M. Newman,et al.  The structure of scientific collaboration networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Erhard Godehardt,et al.  Two Models of Random Intersection Graphs for Classification , 2003 .

[19]  Yiming Yang,et al.  Introducing the Enron Corpus , 2004, CEAS.

[20]  Christian Tallberg A BAYESIAN APPROACH TO MODELING STOCHASTIC BLOCKSTRUCTURES WITH COVARIATES , 2004 .

[21]  László Lovász,et al.  Limits of dense graph sequences , 2004, J. Comb. Theory B.

[22]  J. Pitman Combinatorial Stochastic Processes , 2006 .

[23]  Andreas N. Lagerås,et al.  Epidemics on Random Graphs with Tunable Clustering , 2007, Journal of Applied Probability.

[24]  Carter T. Butts,et al.  4. A Relational Event Framework for Social Action , 2008 .

[25]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data , 2009 .

[26]  Willemien Kets,et al.  RANDOM INTERSECTION GRAPHS WITH TUNABLE DEGREE DISTRIBUTION AND CLUSTERING , 2009, Probability in the Engineering and Informational Sciences.

[27]  Tore Opsahl,et al.  Clustering in weighted networks , 2009, Soc. Networks.

[28]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[29]  S. Feng The Poisson-Dirichlet Distribution and Related Topics , 2010 .

[30]  C. Butts A Relational Event Framework for Social Action , 2010 .

[31]  Nino Boccara,et al.  Power-Law Distributions , 2010 .

[32]  Jure Leskovec,et al.  Signed networks in social media , 2010, CHI.

[33]  St'ephane Robin,et al.  Uncovering latent structure in valued graphs: A variational approach , 2010, 1011.1813.

[34]  Patrick J. Wolfe,et al.  Point process modelling for directed interaction networks , 2010, ArXiv.

[35]  Ji Zhu,et al.  On Consistency of Community Detection in Networks , 2011, ArXiv.

[36]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[37]  Edoardo M. Airoldi,et al.  Confidence sets for network structure , 2011, NIPS.

[38]  Jure Leskovec,et al.  Learning to Discover Social Circles in Ego Networks , 2012, NIPS.

[39]  Jaroslav Nesetril,et al.  Sparsity - Graphs, Structures, and Algorithms , 2012, Algorithms and combinatorics.

[40]  P. Diaconis,et al.  Estimating and understanding exponential random graph models , 2011, 1102.2650.

[41]  A. Rinaldo,et al.  CONSISTENCY UNDER SAMPLING OF EXPONENTIAL RANDOM GRAPH MODELS. , 2011, Annals of statistics.

[42]  P. Wolfe,et al.  Nonparametric graphon estimation , 2013, 1309.5936.

[43]  Mason A. Porter,et al.  Multilayer networks , 2013, J. Complex Networks.

[44]  Yuan Zhang,et al.  Community Detection in Networks with Node Features , 2015, Electronic Journal of Statistics.

[45]  P. Latouche,et al.  Goodness of fit of logistic models for random graphs , 2015 .

[46]  A. Tsybakov,et al.  Oracle inequalities for network models and sparse graphon estimation , 2015, 1507.04118.

[47]  Daniel M. Roy,et al.  Bayesian Models of Graphs, Arrays and Other Exchangeable Random Structures , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Ryan A. Rossi,et al.  The Network Data Repository with Interactive Graph Analytics and Visualization , 2015, AAAI.

[49]  Tracy M. Sweet,et al.  Incorporating Covariates Into Stochastic Blockmodels , 2015 .

[50]  Harrison H. Zhou,et al.  Rate-optimal graphon estimation , 2014, 1410.5837.

[51]  W. Dempsey,et al.  A framework for statistical network modeling , 2015, 1509.08185.

[52]  Daniel M. Roy,et al.  The Class of Random Graphs Arising from Exchangeable Random Measures , 2015, ArXiv.

[53]  Harry Crane,et al.  Rejoinder: The Ubiquitous Ewens Sampling Formula , 2016 .

[54]  Harry Crane,et al.  The Ubiquitous Ewens Sampling Formula , 2016 .

[55]  Emily B. Fox,et al.  Sparse graphs using exchangeable random measures , 2014, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[56]  Christian Borgs,et al.  Sparse Exchangeable Graphs and Their Limits via Graphon Processes , 2016, J. Mach. Learn. Res..

[57]  Konstantin Avrachenkov,et al.  Cooperative Game Theory Approaches for Network Partitioning , 2017, COCOON.

[58]  Mingyuan Zhou Discussion on "Sparse graphs using exchangeable random measures" by Francois Caron and Emily B. Fox , 2018 .

[59]  Yufei Zhao,et al.  An $L^p$ theory of sparse graph convergence I: Limits, sparse random graph models, and power law distributions , 2014, Transactions of the American Mathematical Society.