Review of statistical network analysis: models, algorithms, and software

The analysis of network data is an area that is rapidly growing, both within and outside of the discipline of statistics. This review provides a concise summary of methods and models used in the statistical analysis of network data, including the Erdős–Renyi model, the exponential family class of network models, and recently developed latent variable models. Many of the methods and models are illustrated by application to the well-known Zachary karate dataset. Software routines available for implementing methods are emphasized throughout. The aim of this paper is to provide a review with enough detail about many common classes of network models to whet the appetite and to point the way to further reading. © 2012 Wiley Periodicals, Inc. Statistical Analysis and Data Mining, 2012 (This material is based upon work supported by the Science Foundation Ireland under Grant No. 08/SRC/I1407: Clique: Graph & Network Analysis Cluster.)

[1]  Tom A. B. Snijders,et al.  Markov Chain Monte Carlo Estimation of Exponential Random Graph Models , 2002, J. Soc. Struct..

[2]  P. Bickel,et al.  The method of moments and degree distributions for network models , 2011, 1202.5101.

[3]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[4]  Edward M. Reingold,et al.  Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[5]  Allan Sly,et al.  Random graphs with a given degree sequence , 2010, 1005.1136.

[6]  Valdis E. Krebs,et al.  Mapping Networks of Terrorist Cells , 2001 .

[7]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[8]  P. Latouche,et al.  Overlapping stochastic block models with application to the French political blogosphere , 2009, 0910.2098.

[9]  Thomas Brendan Murphy,et al.  Variational Bayesian inference for the Latent Position Cluster Model , 2009, NIPS 2009.

[10]  Neil J. Hurley,et al.  Detecting Highly Overlapping Communities with Model-Based Overlapping Seed Expansion , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[11]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[12]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[13]  P. Holland,et al.  An Exponential Family of Probability Distributions for Directed Graphs , 1981 .

[14]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[15]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .

[16]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[17]  P. Bonacich Power and Centrality: A Family of Measures , 1987, American Journal of Sociology.

[18]  Edoardo M. Airoldi,et al.  A Survey of Statistical Network Models , 2009, Found. Trends Mach. Learn..

[19]  Peng Wang,et al.  Recent developments in exponential random graph (p*) models for social networks , 2007, Soc. Networks.

[20]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[21]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[22]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[24]  Mark S. Handcock,et al.  A framework for the comparison of maximum pseudo-likelihood and maximum likelihood estimation of exponential family random graph models , 2009, Soc. Networks.

[25]  Ted E. Senator,et al.  The FinCEN Artificial Intelligence System: Identifying Potential Money Laundering from Reports of Large Cash Transactions , 1995, IAAI.

[26]  Russell Lyons,et al.  The Spread of Evidence-Poor Medicine via Flawed Social-Network Analysis , 2010, 1007.2876.

[27]  M. Zelen,et al.  Rethinking centrality: Methods and examples☆ , 1989 .

[28]  David M. Blei,et al.  Hierarchical relational models for document networks , 2009, 0909.4331.

[29]  D. Hunter,et al.  Goodness of Fit of Social Network Models , 2008 .

[30]  Padhraic Smyth,et al.  A Spectral Clustering Approach To Finding Communities in Graph , 2005, SDM.

[31]  J. Murabito,et al.  The Spread of Alcohol Consumption Behavior in a Large Social Network , 2010, Annals of Internal Medicine.

[32]  Jean-Daniel Fekete,et al.  Hierarchical Aggregation for Information Visualization: Overview, Techniques, and Design Guidelines , 2010, IEEE Transactions on Visualization and Computer Graphics.

[33]  Mason A. Porter,et al.  Comparing Community Structure to Characteristics in Online Collegiate Social Networks , 2008, SIAM Rev..

[34]  Pavel N Krivitsky,et al.  Exponential-family random graph models for valued networks. , 2011, Electronic journal of statistics.

[35]  I. C. Gormley,et al.  A mixture of experts latent position cluster model for social network data , 2010 .

[36]  H. White,et al.  “Structural Equivalence of Individuals in Social Networks” , 2022, The SAGE Encyclopedia of Research Design.

[37]  Ulrik Brandes,et al.  On Finding Graph Clusterings with Maximum Modularity , 2007, WG.

[38]  Stephen E. Fienberg,et al.  An Exponential Family of Probability Distributions for Directed Graphs: Comment , 1981 .

[39]  Luciano Rossoni,et al.  Models and methods in social network analysis , 2006 .

[40]  Victor H Hernandez,et al.  Nature Methods , 2007 .

[41]  Phillip Bonacich,et al.  Some unique properties of eigenvector centrality , 2007, Soc. Networks.

[42]  Edoardo M. Airoldi,et al.  Stochastic blockmodels with growing number of classes , 2010, Biometrika.

[43]  Susan M. Shortreed,et al.  Positional Estimation Within a Latent Space Model for Networks , 2006 .

[44]  Moez Draief,et al.  Epidemics and Rumours in Complex Networks , 2010 .

[45]  Garry Robins,et al.  Statistical Models for Networks: A Brief Review of Some Recent Research , 2006, SNA@ICML.

[46]  Peter D. Hoff,et al.  Bilinear Mixed-Effects Models for Dyadic Data , 2005 .

[47]  Michael I. Jordan Graphical Models , 2003 .

[48]  Thomas Hofmann,et al.  Semi-supervised Learning on Directed Graphs , 2004, NIPS.

[49]  Zhi-Hua Zhou,et al.  Semi-supervised learning by disagreement , 2010, Knowledge and Information Systems.

[50]  E. Xing,et al.  A state-space mixed membership blockmodel for dynamic network tomography , 2008, 0901.0135.

[51]  David Auber,et al.  Tulip - A Huge Graph Visualization Framework , 2004, Graph Drawing Software.

[52]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[53]  Yuchung J. Wang,et al.  Stochastic Blockmodels for Directed Graphs , 1987 .

[54]  Peter D. Hoff,et al.  Modeling homophily and stochastic equivalence in symmetric relational data , 2007, NIPS.

[55]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[56]  Garry Robins,et al.  An introduction to exponential random graph (p*) models for social networks , 2007, Soc. Networks.

[57]  Satoru Kawai,et al.  An Algorithm for Drawing General Undirected Graphs , 1989, Inf. Process. Lett..

[58]  S. Wasserman,et al.  Logit models and logistic regressions for social networks: II. Multivariate relations. , 1999, The British journal of mathematical and statistical psychology.

[59]  Martina Morris,et al.  ergm: A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks. , 2008, Journal of statistical software.

[60]  Thomas G. Dietterich Adaptive computation and machine learning , 1998 .

[61]  Mark Culp,et al.  spa: Semi-Supervised Semi-Parametric Graph-Based Estimation in R , 2011 .

[62]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[63]  Kevin Françoisse,et al.  Semi-supervised Classification from Discriminative Random Walks , 2008, ECML/PKDD.

[64]  Ivan Herman,et al.  Graph Visualization and Navigation in Information Visualization: A Survey , 2000, IEEE Trans. Vis. Comput. Graph..

[65]  N. Christakis,et al.  SUPPLEMENTARY ONLINE MATERIAL FOR: The Collective Dynamics of Smoking in a Large Social Network , 2022 .

[66]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[67]  Phipps Arabie,et al.  AN OVERVIEW OF COMBINATORIAL DATA ANALYSIS , 1996 .

[68]  Michael Kaufmann,et al.  Drawing graphs: methods and models , 2001 .

[69]  B. Bollobás The evolution of random graphs , 1984 .

[70]  Adrian E. Raftery,et al.  Representing degree distributions, clustering, and homophily in social networks with latent cluster random effects models , 2009, Soc. Networks.

[71]  C. Geyer,et al.  Constrained Monte Carlo Maximum Likelihood for Dependent Data , 1992 .

[72]  Marco Saerens,et al.  Semi-supervised classification and betweenness computation on large, sparse, directed graphs , 2011, Pattern Recognit..

[73]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[74]  Pavel Berkhin,et al.  A Survey on PageRank Computing , 2005, Internet Math..

[75]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[76]  P. Diaconis,et al.  Estimating and understanding exponential random graph models , 2011, 1102.2650.

[77]  E. Xing,et al.  Discrete Temporal Models of Social Networks , 2006, SNA@ICML.

[78]  Alberto Caimo,et al.  Bayesian inference for exponential random graph models , 2010, Soc. Networks.

[79]  M. Duijn,et al.  Software for social network analysis , 2005 .

[80]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[81]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[82]  Arjan Kuijper,et al.  Visual Analysis of Large Graphs: State‐of‐the‐Art and Future Research Challenges , 2011, Eurographics.

[83]  A. Raftery,et al.  Model‐based clustering for social networks , 2007 .

[84]  D. J. Strauss,et al.  Pseudolikelihood Estimation for Social Networks , 1990 .

[85]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[86]  Peter D. Hoff,et al.  Fast Inference for the Latent Space Network Model Using a Case-Control Approximate Likelihood , 2012, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[87]  Edoardo M. Airoldi,et al.  Statistical Network Analysis: Models, Issues, and New Directions - ICML 2006 Workshop on Statistical Network Analysis, Pittsburgh, PA, USA, June 29, 2006, Revised Selected Papers , 2007, SNA@ICML.

[88]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[89]  M. Den Besten The PageRank Problem , 2010 .

[90]  Peter D. Ho Bilinear Mixed Eects Models for Dyadic Data , 2003 .

[91]  T. Snijders,et al.  p2: a random effects model with covariates for directed graphs , 2004 .

[92]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[93]  Daniel W. Archambault,et al.  Fully Automatic Visualisation of Overlapping Sets , 2009, Comput. Graph. Forum.

[94]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[95]  B. Schölkopf,et al.  A Regularization Framework for Learning from Graph Data , 2004, ICML 2004.

[96]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[97]  Illés J. Farkas,et al.  CFinder: locating cliques and overlapping modules in biological networks , 2006, Bioinform..

[98]  D. Hunter,et al.  Inference in Curved Exponential Family Models for Networks , 2006 .

[99]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[100]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[101]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[102]  N. Christakis,et al.  Dynamic spread of happiness in a large social network: longitudinal analysis over 20 years in the Framingham Heart Study , 2008, BMJ : British Medical Journal.

[103]  Max Welling,et al.  Fast collapsed gibbs sampling for latent dirichlet allocation , 2008, KDD.

[104]  Edward M. Reingold,et al.  Tidier Drawings of Trees , 1981, IEEE Transactions on Software Engineering.

[105]  Vladimir Batagelj,et al.  Pajek - Analysis and Visualization of Large Networks , 2004, Graph Drawing Software.

[106]  M. Handcock Center for Studies in Demography and Ecology Assessing Degeneracy in Statistical Models of Social Networks , 2005 .

[107]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[108]  Jianmin Wu,et al.  Integrated network analysis platform for protein-protein interactions , 2009, Nature Methods.

[109]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure , 1997 .

[110]  S. Wasserman,et al.  Logit models and logistic regressions for social networks: I. An introduction to Markov graphs andp , 1996 .

[111]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[112]  Martina Morris,et al.  Adjusting for Network Size and Composition Effects in Exponential-Family Random Graph Models. , 2010, Statistical methodology.

[113]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[114]  Ove Frank,et al.  http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained , 2007 .

[115]  Bin Yu,et al.  Spectral clustering and the high-dimensional stochastic blockmodel , 2010, 1007.1684.

[116]  Mathieu Bastian,et al.  Gephi: An Open Source Software for Exploring and Manipulating Networks , 2009, ICWSM.

[117]  N. Christakis,et al.  The Spread of Obesity in a Large Social Network Over 32 Years , 2007, The New England journal of medicine.

[118]  T. Wong,et al.  Sexual network analysis of a gonorrhoea outbreak , 2004, Sexually Transmitted Infections.

[119]  Carolyn J. Anderson,et al.  A p* primer: logit models for social networks , 1999, Soc. Networks.

[120]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[121]  F. Soleymani,et al.  PageRank Problem, Survey And Future Research Directions , 2010 .

[122]  Ulrik Brandes,et al.  On Modularity Clustering , 2008, IEEE Transactions on Knowledge and Data Engineering.

[123]  Ulrik Brandes,et al.  Software for visual social network analysis , 2002 .