Block-Approximated Exponential Random Graphs

An important challenge in the field of exponential random graphs (ERGs) is the fitting of non-trivial ERGs on large graphs. By utilizing fast matrix block-approximation techniques, we propose an approximative framework to such non-trivial ERGs that result in dyadic independence (i.e., edge independent) distributions, while being able to meaningfully model local information of the graph (e.g., degrees) as well as global information (e.g., clustering coefficient, assortativity, etc.) if desired. This allows one to efficiently generate random networks with similar properties as an observed network, and the models can be used for several downstream tasks such as link prediction. Our methods are scalable to sparse graphs consisting of millions of nodes.Empirical evaluation demonstrates competitiveness in terms of both speed and accuracy with state-of-the-art methods—which are typically based on embedding the graph into some low-dimensional space— for link prediction, showcasing the potential of a more direct and interpretable probablistic model for this task.

[1]  S. Goreinov,et al.  A Theory of Pseudoskeleton Approximations , 1997 .

[2]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[3]  Tijl De Bie,et al.  Conditional Network Embeddings , 2018, BNAIC/BENELEARN.

[4]  M. Handcock Center for Studies in Demography and Ecology Assessing Degeneracy in Statistical Models of Social Networks , 2005 .

[5]  Fan Chung Graham,et al.  The Average Distance in a Random Graph with Given Expected Degrees , 2004, Internet Math..

[6]  P. Pattison,et al.  New Specifications for Exponential Random Graph Models , 2006 .

[7]  Jianfeng Gao,et al.  Scalable training of L1-regularized log-linear models , 2007, ICML '07.

[8]  Ryan A. Rossi,et al.  Estimation of local subgraph counts , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[9]  P. Holland,et al.  An Exponential Family of Probability Distributions for Directed Graphs , 1981 .

[10]  Ryan A. Rossi,et al.  Higher-order Network Representation Learning , 2018, WWW.

[11]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[12]  Mark S. Handcock,et al.  A framework for the comparison of maximum pseudo-likelihood and maximum likelihood estimation of exponential family random graph models , 2009, Soc. Networks.

[13]  Jian Pei,et al.  Arbitrary-Order Proximity Preserved Network Embedding , 2018, KDD.

[14]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[15]  Wenwu Zhu,et al.  Structural Deep Network Embedding , 2016, KDD.

[16]  N. Kishore Kumar,et al.  Literature survey on low rank approximation of matrices , 2016, ArXiv.

[17]  Tijl De Bie,et al.  Subjectively Interesting Connecting Trees , 2017, ECML/PKDD.

[18]  Tijl De Bie,et al.  Scalable Dyadic Independence Models with Local and Global Constraints , 2020, ArXiv.

[19]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[20]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[21]  Garry Robins,et al.  Analysing exponential random graph (p-star) models with missing data using Bayesian data augmentation , 2010 .

[22]  David P. Woodruff,et al.  Sample-Optimal Low-Rank Approximation of Distance Matrices , 2019, COLT.

[23]  G. W. Stewart,et al.  A Krylov-Schur Algorithm for Large Eigenproblems , 2001, SIAM J. Matrix Anal. Appl..

[24]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[25]  Tamara G. Kolda,et al.  Community structure and scale-free collections of Erdös-Rényi graphs , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  Ivan Markovsky,et al.  Low Rank Approximation - Algorithms, Implementation, Applications , 2018, Communications and Control Engineering.

[27]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[28]  Tijl De Bie,et al.  EvalNE: A Framework for Evaluating Network Embeddings on Link Prediction , 2019, EDML@SDM.

[29]  M. Newman,et al.  Statistical mechanics of networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[30]  D. Lusseau,et al.  The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations , 2003, Behavioral Ecology and Sociobiology.

[31]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[32]  Christian Borgs,et al.  An $L^{p}$ theory of sparse graph convergence II: LD convergence, quotients and right convergence , 2014, 1408.0744.

[33]  Steven M. Goodreau,et al.  Advances in exponential random graph (p*) models applied to a large social network , 2007, Soc. Networks.

[34]  Ryan A. Rossi,et al.  Estimation of Graphlet Counts in Massive Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Danny C. Sorensen,et al.  Deflation Techniques for an Implicitly Restarted Arnoldi Iteration , 1996, SIAM J. Matrix Anal. Appl..

[36]  D. J. Strauss,et al.  Pseudolikelihood Estimation for Social Networks , 1990 .

[37]  Bart Goethals,et al.  Mining interesting sets and rules in relational databases , 2010, SAC '10.

[38]  Patrick J. Wolfe,et al.  Co-clustering separately exchangeable network data , 2012, ArXiv.

[39]  Vishesh Karwa,et al.  DERGMs: Degeneracy-restricted exponential random graph models , 2016, ArXiv.

[40]  D. Hunter,et al.  Goodness of Fit of Social Network Models , 2008 .

[41]  Guido Caldarelli,et al.  Entropy-based approach to missing-links prediction , 2018, Appl. Netw. Sci..

[42]  Rob Malouf,et al.  A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[43]  Tijl De Bie,et al.  Maximum entropy models and subjective interestingness: an application to tiles in binary databases , 2010, Data Mining and Knowledge Discovery.

[44]  Garry Robins,et al.  An introduction to exponential random graph (p*) models for social networks , 2007, Soc. Networks.

[45]  H. Yau,et al.  Spectral statistics of Erdős–Rényi graphs I: Local semicircle law , 2011, 1103.1919.

[46]  Ove Frank,et al.  http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained , 2007 .

[47]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[48]  Ryan A. Rossi,et al.  Learning Role-based Graph Embeddings , 2018, ArXiv.

[49]  Allan Sly,et al.  Random graphs with a given degree sequence , 2010, 1005.1136.

[50]  Isaac Skog,et al.  The $\beta$-Model—Maximum Likelihood, Cramér–Rao Bounds, and Hypothesis Testing , 2017, IEEE Transactions on Signal Processing.

[51]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[52]  Daniel R. Figueiredo,et al.  struc2vec: Learning Node Representations from Structural Identity , 2017, KDD.

[53]  P. Diaconis,et al.  Estimating and understanding exponential random graph models , 2011, 1102.2650.

[54]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[55]  Tamara G. Kolda,et al.  An in-depth analysis of stochastic Kronecker graphs , 2011, JACM.

[56]  Linyuan Lü,et al.  Predicting missing links via local information , 2009, 0901.0553.

[57]  Antonietta Mira,et al.  Fast Maximum Likelihood Estimation via Equilibrium Expectation for Large Network Data , 2018, Scientific Reports.

[58]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[59]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2008 update , 2008, Nucleic Acids Res..

[60]  Dimitris Achlioptas,et al.  Fast computation of low rank matrix approximations , 2001, STOC '01.

[61]  Tijl De Bie,et al.  Subjectively interesting connecting trees and forests , 2019, Data Mining and Knowledge Discovery.

[62]  C. Hillar,et al.  Maximum entropy distributions on graphs , 2013, 1301.3321.

[63]  Tamara G. Kolda,et al.  A Scalable Generative Graph Model with Community Structure , 2013, SIAM J. Sci. Comput..

[64]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[65]  Danai Koutra,et al.  From Community to Role-based Graph Embeddings , 2019, ArXiv.

[66]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[67]  Sergei Vassilvitskii,et al.  How slow is the k-means method? , 2006, SCG '06.

[68]  M E J Newman Assortative mixing in networks. , 2002, Physical review letters.

[69]  N. Higham,et al.  Bounds for eigenvalues of matrix polynomials , 2003 .