Bayesian analysis for partially observed network data, missing ties, attributes and actors

Abstract We consider partially observed network data as defined in Handcock and Gile (2010) . More specifically we introduce an elaboration of the Bayesian data augmentation scheme of Koskinen et al. (2010) that uses the exchange algorithm ( Caimo and Friel, 2011 ) for inference for the exponential random graph model (ERGM) where tie variables are partly observed. We illustrate the generating of posteriors and unobserved tie-variables with empirical network data where 74% of the tie variables are unobserved under the assumption that some standard assumptions hold true. One of these assumptions is that covariates are fixed and completely observed. A likely scenario is that also covariates might only be partially observed and we propose a further extension of the data augmentation algorithm for missing attributes. We provide an illustrative example of parameter inference with nearly 30% of dyads affected by missing attributes (e.g. homophily effects). The assumption that all actors are known is another assumption that is liable to be violated so that there are “covert actors”. We briefly discuss various aspects of this problem with reference to the Sageman (2004) data set on suspected terrorists. We conclude by identifying some areas in need of further research.

[1]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[2]  David Krackhardt,et al.  Cognitive social structures , 1987 .

[3]  David Bruce Wilson,et al.  Exact sampling with coupled Markov chains and applications to statistical mechanics , 1996, Random Struct. Algorithms.

[4]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[5]  Peng Wang,et al.  Modelling a disease-relevant contact network of people who inject drugs , 2013, Soc. Networks.

[6]  Peng Wang,et al.  Closure, connectivity and degree distributions: Exponential random graph (p*) models for directed social networks , 2009, Soc. Networks.

[7]  Alberto Caimo,et al.  Bayesian model selection for exponential random graph models , 2012, Soc. Networks.

[8]  Peng Wang,et al.  Exponential random graph models for multilevel networks , 2013, Soc. Networks.

[9]  Zoubin Ghahramani,et al.  MCMC for Doubly-intractable Distributions , 2006, UAI.

[10]  Paul E. Green,et al.  Bayesian Methods for Generalized Linear Models , 1999 .

[11]  David Bruce Wilson,et al.  How to couple from the past using a read-once source of randomness , 1999, Random Struct. Algorithms.

[12]  Peter D. Hoff,et al.  Multiplicative latent factor models for description and prediction of social networks , 2009, Comput. Math. Organ. Theory.

[13]  Ranran Wang Bayesian Inference of Exponential-family Random Graph Models for Social Networks , 2011 .

[14]  S. Chib,et al.  Understanding the Metropolis-Hastings Algorithm , 1995 .

[15]  M. Schweinberger Instability, Sensitivity, and Degeneracy of Discrete Exponential Families , 2011, Journal of the American Statistical Association.

[16]  S. Lipsitz,et al.  Missing-Data Methods for Generalized Linear Models , 2005 .

[17]  Mark Huisman,et al.  Imputation of missing network data: Some simple procedures , 2009, J. Soc. Struct..

[18]  Garry Robins,et al.  Network models for social selection processes , 2001, Soc. Networks.

[19]  P. Killworth,et al.  Informant accuracy in social network data IV: a comparison of clique-level structure in behavioral and cognitive network data , 1979 .

[20]  J. Jonasson The random triangle model , 1999, Journal of Applied Probability.

[21]  Tom A. B. Snijders,et al.  Markov Chain Monte Carlo Estimation of Exponential Random Graph Models , 2002, J. Soc. Struct..

[22]  Carter T. Butts,et al.  Network inference, error, and informant (in)accuracy: a Bayesian approach , 2003, Soc. Networks.

[23]  Martina Morris,et al.  statnet: Software Tools for the Representation, Visualization, Analysis and Simulation of Network Data. , 2008, Journal of statistical software.

[24]  Mark S Handcock,et al.  MODELING SOCIAL NETWORKS FROM SAMPLED DATA. , 2010, The annals of applied statistics.

[25]  Alberto Caimo,et al.  Bayesian inference for exponential random graph models , 2010, Soc. Networks.

[26]  D. Hunter,et al.  Inference in Curved Exponential Family Models for Networks , 2006 .

[27]  Johan Koskinen,et al.  The Linked Importance Sampler Auxiliary Variable Metropolis Hastings Algorithm for Distributions with Intractable Normalising Constants , 2008 .

[28]  P. Pattison,et al.  Network models for social influence processes , 2001 .

[29]  P. Pattison,et al.  Conditional estimation of exponential random graph models from snowball sampling designs , 2013 .

[30]  T. Suesse Marginalized Exponential Random Graph Models , 2012 .

[31]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[32]  E. Lazega Introduction : Collegial Phenomenon : The Social Mechanisms of Cooperation Among Peers in a Corporate Law Partnership , 2001 .

[33]  Douglas D. Heckathorn,et al.  Comment: Snowball versus Respondent-Driven Sampling , 2011 .

[34]  Steven B. Andrews,et al.  Structural Holes: The Social Structure of Competition , 1995, The SAGE Encyclopedia of Research Design.

[35]  T. Snijders,et al.  "Simulation, Estimation, and Goodness of Fit" , 2013 .

[36]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[37]  Nan M. Laird,et al.  Regression Analysis for Categorical Variables with Outcome Subject to Nonignorable Nonresponse , 1988 .

[38]  Johan Koskinen,et al.  Using latent variables to account for heterogeneity in exponential family random graph models , 2009 .

[39]  Carl-Erik Särndal,et al.  Model Assisted Survey Sampling , 1997 .

[40]  Ove Frank,et al.  http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained , 2007 .

[41]  Joseph G. Ibrahim,et al.  A conditional model for incomplete covariates in parametric regression models , 1996 .

[42]  J G Ibrahim,et al.  Parameter estimation from incomplete data in binomial regression when the missing data mechanism is nonignorable. , 1996, Biometrics.

[43]  Mark S Handcock,et al.  7. Respondent-Driven Sampling: An Assessment of Current Methodology , 2009, Sociological methodology.

[44]  C. J. Rhodes,et al.  Social network topology: a Bayesian approach , 2007, J. Oper. Res. Soc..

[45]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[46]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[47]  Joseph G Ibrahim,et al.  Bayesian Analysis for Generalized Linear Models with Nonignorably Missing Covariates , 2005, Biometrics.

[48]  Ove Frank,et al.  Survey sampling in networks , 2011 .

[49]  Peng Wang,et al.  Recent developments in exponential random graph (p*) models for social networks , 2007, Soc. Networks.

[50]  R. Little,et al.  Maximum likelihood estimation for mixed continuous and categorical data with missing values , 1985 .

[51]  Garry Robins,et al.  Illustrations: simulation, estimation and goodness of fit , 2013 .

[52]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .

[53]  W J Krzanowski,et al.  Mixtures of continuous and categorical variables in discriminant analysis. , 1980, Biometrics.

[54]  William Richards,et al.  Nonrespondents in Communication Network Studies , 1992 .

[55]  J. Propp,et al.  Exact sampling with coupled Markov chains and applications to statistical mechanics , 1996 .

[56]  Tom A. B. Snijders Conditional Marginalization for Exponential Random Graph Models , 2010 .

[57]  Garry Robins,et al.  Analysing exponential random graph (p-star) models with missing data using Bayesian data augmentation , 2010 .

[58]  T. Snijders,et al.  Modeling Social Networks: Next Steps , 2013 .

[59]  Roderick J. A. Little Regression with Missing X's: A Review , 1992 .

[60]  Joseph G. Ibrahim,et al.  Bayesian methods for generalized linear models with covariates missing at random , 2002 .

[61]  Ove Frank,et al.  Estimation of Offending and Co-offending Using Available Data with Model Support , 2007 .

[62]  Garry Robins,et al.  Missing data in networks: exponential random graph (p∗) models for networks with non-respondents , 2004, Soc. Networks.

[63]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[64]  Marc Sageman,et al.  Understanding terror networks. , 2004, International journal of emergency mental health.

[65]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[66]  M. Handcock Center for Studies in Demography and Ecology Assessing Degeneracy in Statistical Models of Social Networks , 2005 .

[67]  David R. Hunter,et al.  Curved exponential family models for social networks , 2007, Soc. Networks.

[68]  Julien Brailly,et al.  Exponential Random Graph Models for Social Networks , 2014 .

[69]  Jeffrey A Smith,et al.  Macrostructure from Microstructure , 2012, Sociological methodology.

[70]  Garry Robins,et al.  Closure , connectivity and degrees : New specifications for exponential random graph ( p * ) models for directed social networks , 2006 .

[71]  P. Pattison,et al.  New Specifications for Exponential Random Graph Models , 2006 .

[72]  Joseph G. Ibrahim,et al.  Missing covariates in generalized linear models when the missing data mechanism is non‐ignorable , 1999 .

[73]  P. Carrington CO‐OFFENDING AND THE DEVELOPMENT OF THE DELINQUENT CAREER* , 2009 .

[74]  J. Ibrahim Incomplete Data in Generalized Linear Models , 1990 .

[75]  Dawn Iacobucci,et al.  Statistical Modelling of One-Mode and Two-Mode Networks: Simultaneous Analysis of Graphs and Bipartite Graphs , 1991 .

[76]  P. Jones,et al.  Inferring missing links in partially observed social networks , 2009, J. Oper. Res. Soc..

[77]  Martina Morris,et al.  Specification of Exponential-Family Random Graph Models: Terms and Computational Aspects. , 2008, Journal of statistical software.

[78]  R. Little Models for Nonresponse in Sample Surveys , 1982 .

[79]  Johan Koskinen,et al.  Essays on Bayesian Inference for Social Networks , 2004 .

[80]  Gueorgi Kossinets Effects of missing data in social networks , 2006, Soc. Networks.

[81]  Mark S. Handcock,et al.  Modeling Social Networks with Sampled or Missing Data , 2007 .

[82]  N M Laird,et al.  Maximum likelihood analysis of generalized linear models with missing covariates , 1999, Statistical methods in medical research.