CONSISTENCY UNDER SAMPLING OF EXPONENTIAL RANDOM GRAPH MODELS.

The growing availability of network data and of scientific interest in distributed systems has led to the rapid development of statistical models of network structure. Typically, however, these are models for the entire network, while the data consists only of a sampled sub-network. Parameters for the whole network, which is what is of interest, are estimated by applying the model to the sub-network. This assumes that the model is consistent under sampling, or, in terms of the theory of stochastic processes, that it defines a projective family. Focusing on the popular class of exponential random graph models (ERGMs), we show that this apparently trivial condition is in fact violated by many popular and scientifically appealing models, and that satisfying it drastically limits ERGM's expressive power. These results are actually special cases of more general results about exponential families of dependent random variables, which we also prove. Using such results, we offer easily checked conditions for the consistency of maximum likelihood estimation in ERGMs, and discuss some possible constructive responses.

[1]  S. Varadhan,et al.  Large deviations , 2019, Graduate Studies in Mathematics.

[2]  R. Ackland,et al.  Online collective identity: The case of the environmental movement , 2011, Soc. Networks.

[3]  Jennifer Neville,et al.  Relational Learning with One Network: An Asymptotic Analysis , 2011, AISTATS.

[4]  Stephen E. Fienberg,et al.  Maximum Likelihood Estimation in Network Models , 2011, ArXiv.

[5]  David R. Schaefer,et al.  Youth co-offending networks: An investigation of social and spatial effects , 2011, Soc. Networks.

[6]  P. Diaconis,et al.  Estimating and understanding exponential random graph models , 2011, 1102.2650.

[7]  Mark S Handcock,et al.  MODELING SOCIAL NETWORKS FROM SAMPLED DATA. , 2010, The annals of applied statistics.

[8]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  E. David,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World , 2010 .

[10]  Garry Robins,et al.  Obesity-related behaviors in adolescent friendship networks , 2010, Soc. Networks.

[11]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[12]  Allan Sly,et al.  Random graphs with a given degree sequence , 2010, 1005.1136.

[13]  M. Handcock,et al.  Adjusting for Network Size and Composition Effects in Exponential-Family Random Graph Models. , 2010, Statistical methodology.

[14]  Alexander I. Barvinok,et al.  The number of graphs and a random graph with a given degree sequence , 2010, Random Struct. Algorithms.

[15]  Edoardo M. Airoldi,et al.  A Survey of Statistical Network Models , 2009, Found. Trends Mach. Learn..

[16]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[17]  Sandra González-Bailón,et al.  Opening the black box of link formation: Social factors underlying the structure of the web , 2009, Soc. Networks.

[18]  Marijtje A. J. Van Duijn,et al.  Ethnic segregation in context: Social discrimination among native Dutch pupils and their ethnic minority classmates , 2009, Soc. Networks.

[19]  S. Chatterjee,et al.  Applications of Stein's method for concentration inequalities , 2009, 0906.1034.

[20]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data: Methods and Models , 2009 .

[21]  S. Goodreau,et al.  Birds of a feather, or friend of a friend? using exponential random graph models to investigate adolescent social networks* , 2009, Demography.

[22]  A. Rinaldo,et al.  On the geometry of discrete exponential families with application to exponential random graph models , 2008, 0901.0026.

[23]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[24]  Guy Bresler,et al.  Mixing Time of Exponential Random Graphs , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[25]  Martina Morris,et al.  statnet: Software Tools for the Representation, Visualization, Analysis and Simulation of Network Data. , 2008, Journal of statistical software.

[26]  H. Touchette The large deviation approach to statistical mechanics , 2008, 0804.0327.

[27]  Tom A. B. Snijders,et al.  A comparison of various approaches to the exponential random graph model: A reanalysis of 102 student networks in school classes , 2007, Soc. Networks.

[28]  Peng Wang,et al.  Recent developments in exponential random graph (p*) models for social networks , 2007, Soc. Networks.

[29]  P. Pattison,et al.  New Specifications for Exponential Random Graph Models , 2006 .

[30]  Eric P. Xing,et al.  Discrete Temporal Models of Social Networks , 2006, SNA@ICML.

[31]  Carsten Wiuf,et al.  Subnets of scale-free networks are not scale-free: sampling properties of networks. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[32]  S. Wasserman,et al.  Models and Methods in Social Network Analysis , 2005 .

[33]  M. Newman,et al.  Solution for the properties of a clustered network. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[34]  László Lovász,et al.  Limits of dense graph sequences , 2004, J. Comb. Theory B.

[35]  M. Newman,et al.  Statistical mechanics of networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[36]  M. Newman,et al.  Solution of the two-star model of a network. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[37]  Gerry Leversha,et al.  Foundations of modern probability (2nd edn), by Olav Kallenberg. Pp. 638. £49 (hbk). 2002. ISBN 0 387 95313 2 (Springer-Verlag). , 2004, The Mathematical Gazette.

[38]  Gueorgi Kossinets Effects of missing data in social networks , 2003, Soc. Networks.

[39]  M. Nauenberg Critique of q-entropy for thermal statistics. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[40]  John Skvoretz,et al.  8. Comparing Networks across Space and Time, Size and Species , 2002 .

[41]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[42]  G. Jona-Lasinio Renormalization group and probability theory , 2000, cond-mat/0009219.

[43]  S. Wasserman,et al.  Logit models and logistic regressions for social networks: II. Multivariate relations. , 1999, The British journal of mathematical and statistical psychology.

[44]  S. Wasserman,et al.  Logit models and logistic regressions for social networks: I. An introduction to Markov graphs andp , 1996 .

[45]  M. Schervish Theory of Statistics , 1995 .

[46]  J. Besag A candidate's formula: A curious result in Bayesian prediction , 1989 .

[47]  S. Lauritzen Extremal Families and Systems of Sufficient Statistics , 1988 .

[48]  L. Brown Fundamentals of statistical exponential families: with applications in statistical decision theory , 1986 .

[49]  R. Butler Predictive Likelihood Inference with Applications , 1986 .

[50]  P. Holland,et al.  An Exponential Family of Probability Distributions for Directed Graphs , 1981 .

[51]  O. Barndorff-Nielsen Information and Exponential Families in Statistical Theory , 1980 .

[52]  C. Thompson The Statistical Mechanics of Phase Transitions , 1978 .

[53]  B. Mandelbrot The Role of Sufficiency and of Estimation in Thermodynamics , 1962 .

[54]  Chris Arney,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World (Easley, D. and Kleinberg, J.; 2010) [Book Review] , 2013, IEEE Technology and Society Magazine.

[55]  Bill Mitchell,et al.  Networks and geography: Modelling community network structures as the outcome of both spatial and network processes , 2012, Soc. Networks.

[56]  Neha Gondal The local and global structure of knowledge production in an emergent research field: An exponential random graph analysis , 2011, Soc. Networks.

[57]  Ramana Rao Kompella,et al.  Reconsidering the Foundations of Network Sampling , 2010 .

[58]  J. Rissanen Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[59]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data , 2009 .

[60]  P. Diaconis,et al.  Graph limits and exchangeable random graphs , 2007, 0712.2749.

[61]  H. Hees,et al.  Statistical Physics , 2004 .

[62]  Ove Frank,et al.  http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained , 2007 .

[63]  S. Lauritzen Exchangeable Rasch Matrices∗ , 2007 .

[64]  S. Wasserman,et al.  Models and Methods in Social Network Analysis: An Introduction to Random Graphs, Dependence Graphs, and p * , 2005 .

[65]  Cristopher Moore,et al.  On the bias of traceroute sampling: or, power-law degree distributions in regular graphs , 2005, STOC '05.

[66]  T. Snijders Models for longitudinal network datain , 2005 .

[67]  Carolyn J. Anderson,et al.  A p* primer: logit models for social networks , 1999, Soc. Networks.

[68]  J. Yeomans,et al.  Statistical mechanics of phase transitions , 1992 .

[69]  R. R. Bahadur Some Limit Theorems in Statistics , 1987 .