A survey of discrete methods in (algebraic) statistics for networks

Sampling algorithms, hypergraph degree sequences, and polytopes play a crucial role in statistical analysis of network data. This article offers a brief overview of open problems in this area of discrete mathematics from the point of view of a particular family of statistical models for networks called exponential random graph models. The problems and underlying constructions are also related to well-known concepts in commutative algebra and graph-theoretic concepts in computer science. We outline a few lines of recent work that highlight the natural connection between these fields and unify them into some open problems. While these problems are often relevant in discrete mathematics in their own right, the emphasis here is on statistical relevance with the hope that these lines of research do not remain disjoint. Suggested specific open problems and general research questions should advance algebraic statistics theory as well as applied statistical tools for rigorous statistical analysis of networks.

[1]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[2]  Sonja Petrović,et al.  Toric algebra of hypergraphs , 2012, 1206.1904.

[3]  Elizabeth Gross,et al.  Maximum likelihood degree of variance component models , 2011, 1111.3308.

[4]  Murali K. Srinivasan SOME PROBLEMS MOTIVATED BY THE NOTION OF THRESHOLD GRAPHS , 2009 .

[5]  Hyunju Kim,et al.  Degree-based graph construction , 2009, 0905.4892.

[6]  青木 敏,et al.  Markov bases in algebraic statistics , 2012 .

[7]  Stephen G. Hartke,et al.  New Results on Degree Sequences of Uniform Hypergraphs , 2013, Electron. J. Comb..

[8]  Henry P. Wynn,et al.  Generalised confounding with Grobner bases , 1996 .

[9]  A. Rinaldo,et al.  On the Existence of the MLE for a Directed Random Graph Network Model with Reciprocation , 2010, 1010.0745.

[10]  Matthieu Latapy,et al.  Efficient and simple generation of random simple connected graphs with prescribed degree sequence , 2005, J. Complex Networks.

[11]  A. Slavkovic,et al.  Fibers of multi-way contingency tables given conditionals: relation to marginals, cell bounds and Markov bases , 2014, Annals of the Institute of Statistical Mathematics.

[12]  Adrian Dobra,et al.  Dynamic Markov Bases , 2011, 1103.4891.

[13]  A. Rinaldo,et al.  Statistical Models for Degree Distributions of Networks , 2014, 1411.3825.

[14]  青木 敏,et al.  Lectures on Algebraic Statistics (Oberwolfach Seminars Vol.39), Mathias Drton, Bernd Sturmfels and Seth Sullivant 著, Birkhauser, Basel, Boston, Berlin, 2009年3月, 171+viii pp., 価格 24.90i, ISBN 978-3-7643-8904-8 , 2012 .

[15]  Allan Sly,et al.  Random graphs with a given degree sequence , 2010, 1005.1136.

[16]  Chenlei Leng,et al.  Asymptotics in directed exponential random graph models with an increasing bi-degree sequence , 2014, 1408.1156.

[17]  P. Diaconis,et al.  Algebraic algorithms for sampling from conditional distributions , 1998 .

[18]  Bernd Sturmfels,et al.  Commutative Algebra of Statistical Ranking , 2011, ArXiv.

[19]  Elizabeth Gross,et al.  Goodness of fit for log-linear network models: dynamic Markov bases using hypergraphs , 2014, 1401.4896.

[20]  R. Fisher Statistical Methods for Research Workers , 1971 .

[21]  M. Newman,et al.  Random graphs with arbitrary degree distributions and their applications. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[23]  Aleksandra Slavkovic,et al.  Partial Information Releases for Confidential Contingency Table Entries: Present and Future Research Efforts , 2010, J. Priv. Confidentiality.

[24]  Zoltán Toroczkai,et al.  A Decomposition Based Proof for Fast Mixing of a Markov Chain over Balanced Realizations of a Joint Degree Matrix , 2015, SIAM J. Discret. Math..

[25]  S. Hakimi On Realizability of a Set of Integers as Degrees of the Vertices of a Linear Graph. I , 1962 .

[26]  M. M. Meyer,et al.  Statistical Analysis of Multiple Sociometric Relations. , 1985 .

[27]  M. Drton,et al.  Global identifiability of linear structural equation models , 2010, 1003.1146.

[28]  S. Petrovi'c,et al.  Bouquet algebra of toric ideals , 2015, Journal of Algebra.

[29]  Takayuki Hibi,et al.  Toric Ideals Generated by Quadratic Binomials , 1999 .

[30]  Stephen E. Fienberg,et al.  Β Models for Random Hypergraphs with a given Degree Sequence , 2014, ArXiv.

[31]  Alessandro Rinaldo,et al.  Hierarchical models for independence structures of networks , 2016, Statistica Neerlandica.

[32]  P. Bickel,et al.  Mathematical Statistics: Basic Ideas and Selected Topics , 1977 .

[33]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[34]  Bernd Sturmfels,et al.  FIXED POINTS OF THE EM ALGORITHM AND NONNEGATIVE RANK BOUNDARIES , 2013, 1312.5634.

[35]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[36]  Isabelle Stanton,et al.  Constructing and sampling graphs with a prescribed joint degree distribution , 2011, JEAL.

[37]  Tobias Windisch,et al.  Rapid Mixing and Markov Bases , 2015, SIAM J. Discret. Math..

[38]  N. Mahadev,et al.  Threshold graphs and related topics , 1995 .

[39]  Alessandro Rinaldo,et al.  Asymptotic quantization of exponential random graphs , 2013, 1311.1738.

[40]  Zoltán Toroczkai,et al.  Reducing degeneracy in maximum entropy models of networks. , 2014, Physical review letters.

[41]  István Miklós,et al.  Towards Random Uniform Sampling of Bipartite Graphs with given Degree Sequence , 2010, Electron. J. Comb..

[42]  Prasad Tetali,et al.  Simple Markov-chain algorithms for generating bipartite graphs and tournaments , 1997, SODA '97.

[43]  U. Peled,et al.  Cones of closed alternating walks and trails , 2005, math/0511692.

[44]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[45]  Murali K. Srinivasan,et al.  The polytope of degree sequences of hypergraphs , 2002 .

[46]  Amitava Bhattacharya Alternating Reachability and Integer Sum of Closed Alternating Trails - The 3rd Annual Uri N. Peled Memorial Lecture , 2012, WG.

[47]  Sarah Kuester,et al.  An Invitation To Algebraic Geometry , 2016 .

[48]  Enrique Reyes,et al.  Minimal generators of toric ideals of graphs , 2010, Adv. Appl. Math..

[49]  Aleksandra B. Slavkovic,et al.  Differentially Private Graphical Degree Sequences and Synthetic Graphs , 2012, Privacy in Statistical Databases.

[50]  Vishesh Karwa,et al.  Inference using noisy degrees: Differentially private $\beta$-model and synthetic graphs , 2012, 1205.4697.

[51]  Zoltán Toroczkai,et al.  A Simple Havel-Hakimi Type Algorithm to Realize Graphical Degree Sequences of Directed Graphs , 2009, Electron. J. Comb..

[52]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data , 2009 .

[53]  渡邊 澄夫 Algebraic geometry and statistical learning theory , 2009 .

[54]  O. Barndorff-Nielsen Information and Exponential Families in Statistical Theory , 1980 .

[55]  Jason Morton,et al.  Relations among conditional probabilities , 2008, J. Symb. Comput..

[56]  Stephen E. Fienberg,et al.  Algebraic Statistics for a Directed Random Graph Model with Reciprocation , 2009, 0909.0073.

[57]  Seth Sullivant,et al.  A Divide-and-Conquer Algorithm for Generating Markov Bases of Multi-way Tables , 2004, Comput. Stat..

[58]  Edoardo M. Airoldi,et al.  A Survey of Statistical Network Models , 2009, Found. Trends Mach. Learn..

[59]  David Eppstein,et al.  ERGMs are Hard , 2014, ArXiv.

[60]  P. Wolfe,et al.  Nonparametric graphon estimation , 2013, 1309.5936.

[61]  U. Peled,et al.  The cone of balanced subgraphs , 2009 .

[62]  Hisayuki Hara,et al.  Graver basis for an undirected graph and its application to testing the beta model of random graphs , 2011, 1102.2583.

[63]  Ricky Ini Liu Nonconvexity of the Set of Hypergraph Degree Sequences , 2012, Electron. J. Comb..

[64]  I. T. Jolliffe,et al.  Springer series in statistics , 1986 .

[65]  Jesús A. De Loera,et al.  Markov bases of three-way tables are arbitrarily complicated , 2006, J. Symb. Comput..

[66]  H. Wynn,et al.  Algebraic Methods in Statistics and Probability II , 2001 .

[67]  Seth Sullivant,et al.  Identifiability of Two-Tree Mixtures for Group-Based Models , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[68]  Seth Sullivant,et al.  The space of compatible full conditionals is a unimodular toric variety , 2006, J. Symb. Comput..

[69]  M. Handcock Center for Studies in Demography and Ecology Assessing Degeneracy in Statistical Models of Social Networks , 2005 .

[70]  Charles J. Geyer,et al.  Likelihood inference in exponential families and directions of recession , 2009, 0901.0455.

[71]  P. Holland,et al.  An Exponential Family of Probability Distributions for Directed Graphs , 1981 .

[72]  J. F. C. Kingman,et al.  Information and Exponential Families in Statistical Theory , 1980 .

[73]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[74]  Seth Sullivant,et al.  A finiteness theorem for Markov bases of hierarchical models , 2007, J. Comb. Theory, Ser. A.

[75]  Pavel N Krivitsky,et al.  On the Question of Effective Sample Size in Network Modeling: An Asymptotic Inquiry. , 2011, Statistical science : a review journal of the Institute of Mathematical Statistics.

[76]  D. Hunter,et al.  Goodness of Fit of Social Network Models , 2008 .

[77]  Pavel N Krivitsky,et al.  An Approximation Method for Improving Dynamic Network Model Fitting , 2015, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[78]  Seth Sullivant,et al.  Lectures on Algebraic Statistics , 2008 .

[79]  S. Sivasubramanian,et al.  The Polytope of Degree Partitions , 2006, Electron. J. Comb..

[80]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[81]  J. Okninski,et al.  On monomial algebras , 1988, Semigroup Algebras.

[82]  Rafael H. Villarreal,et al.  Rees algebras of edge ideals , 1995 .

[83]  Steven M. Goodreau,et al.  Advances in exponential random graph (p*) models applied to a large social network , 2007, Soc. Networks.

[84]  Shelby J. Haberman,et al.  A Warning on the Use of Chi-Squared Statistics with Frequency Tables with Small Expected Cell Counts , 1988 .

[85]  Seth Sullivant,et al.  The maximum likelihood threshold of a graph , 2014, 1404.6989.

[86]  Christian P. Robert,et al.  Monte Carlo Statistical Methods (Springer Texts in Statistics) , 2005 .

[87]  Stephen E. Fienberg,et al.  An Exponential Family of Probability Distributions for Directed Graphs: Comment , 1981 .

[88]  A. Rinaldo,et al.  On the geometry of discrete exponential families with application to exponential random graph models , 2008, 0901.0026.

[89]  Caroline Uhler,et al.  Maximum likelihood estimation for linear Gaussian covariance models , 2014, 1408.5604.

[90]  A. Dobra Markov bases for decomposable graphical models , 2003 .

[91]  Martina Morris,et al.  A Simple Model for Complex Networks with Arbitrary Degree Distribution and Clustering , 2006, SNA@ICML.

[92]  Persi Diaconis,et al.  A Sequential Importance Sampling Algorithm for Generating Random Graphs with Prescribed Degrees , 2011, Internet Math..

[93]  M. Drton,et al.  Half-trek criterion for generic identifiability of linear structural equation models , 2011, 1107.5552.

[94]  Vishesh Karwa,et al.  Statistical models for cores decomposition of an undirected random graph , 2014, ArXiv.

[95]  Stephen E. Fienberg,et al.  Maximum lilkelihood estimation in the $\beta$-model , 2011, 1105.6145.

[96]  László Lovász,et al.  Limits of dense graph sequences , 2004, J. Comb. Theory B.

[97]  Garry Robins,et al.  An introduction to exponential random graph (p*) models for social networks , 2007, Soc. Networks.

[98]  Giovanni Pistone,et al.  Computational commutative algebra in discrete statistics , 2000 .

[99]  A. Agresti [A Survey of Exact Inference for Contingency Tables]: Rejoinder , 1992 .

[100]  A. Rinaldo,et al.  Algebraic Statistics and Contingency Table Problems: Log-Linear Models, Likelihood Estimation, and Disclosure Limitation , 2009 .

[101]  István Miklós,et al.  On realizations of a joint degree matrix , 2015, Discret. Appl. Math..

[102]  Walter Willinger,et al.  Mathematics and the Internet: A Source of Enormous Confusion and Great Potential , 2009, The Best Writing on Mathematics 2010.

[103]  Lawrence D. Brown Fundamentals of Statistical Exponential Families , 1987 .

[104]  Martin E. Dyer,et al.  Sampling regular graphs and a peer-to-peer network , 2005, SODA '05.

[105]  Christian P. Robert,et al.  Introducing Monte Carlo Methods with R , 2009 .

[106]  Sumio Watanabe,et al.  Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory , 2010, J. Mach. Learn. Res..