Multiple imputation for missing edge data: A predictive evaluation method with application to Add Health

Recent developments have made model-based imputation of network data feasible in principle, but the extant literature provides few practical examples of its use. In this paper we consider 14 schools from the widely used In-School Survey of Add Health (Harris et al., 2009), applying an ERGM-based estimation and simulation approach to impute the network missing data for each school. Add Health's complex study design leads to multiple types of missingness, and we introduce practical techniques for handing each. We also develop a cross-validation based method - Held-Out Predictive Evaluation (HOPE) - for assessing this approach. Our results suggest that ERGM-based imputation of edge variables is a viable approach to the analysis of complex studies such as Add Health, provided that care is used in understanding and accounting for the study design.

[1]  S. Wasserman,et al.  Models and Methods in Social Network Analysis: An Introduction to Random Graphs, Dependence Graphs, and p * , 2005 .

[2]  Carter T. Butts,et al.  Research note: The consequences of different methods for handling missing network data in stochastic actor based models , 2015, Soc. Networks.

[3]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994 .

[4]  Mark S Handcock,et al.  MODELING SOCIAL NETWORKS FROM SAMPLED DATA. , 2010, The annals of applied statistics.

[5]  Garry Robins,et al.  Bayesian analysis for partially observed network data, missing ties, attributes and actors , 2013, Soc. Networks.

[6]  Tom A. B. Snijders,et al.  Introduction to stochastic actor-based models for network dynamics , 2010, Soc. Networks.

[7]  Martina Morris,et al.  A statnet Tutorial. , 2008, Journal of statistical software.

[8]  P. Pattison,et al.  New Specifications for Exponential Random Graph Models , 2006 .

[9]  Gueorgi Kossinets Effects of missing data in social networks , 2006, Soc. Networks.

[10]  Garry Robins,et al.  Analysing exponential random graph (p-star) models with missing data using Bayesian data augmentation , 2010 .

[11]  Michael Schweinberger,et al.  MAXIMUM LIKELIHOOD ESTIMATION FOR SOCIAL NETWORK DYNAMICS. , 2010, The annals of applied statistics.

[12]  Mark Huisman,et al.  Statistical Analysis of Longitudinal Network Data With Changing Composition , 2003 .

[13]  Carter T. Butts,et al.  Network inference, error, and informant (in)accuracy: a Bayesian approach , 2003, Soc. Networks.

[14]  A. Ghani,et al.  Sampling biases and missing data in explorations of sexual partner networks for the spread of sexually transmitted diseases. , 1998, Statistics in medicine.

[15]  Ronald S. Burt,et al.  A note on missing network data in the general social survey , 1987 .

[16]  Mark Huisman,et al.  Treatment of non-response in longitudinal network studies , 2008, Soc. Networks.

[17]  Garry Robins,et al.  Missing data in networks: exponential random graph (p∗) models for networks with non-respondents , 2004, Soc. Networks.

[18]  Zack W. Almquist Random errors in egocentric networks , 2012, Soc. Networks.

[19]  Mark Huisman,et al.  Imputation of missing network data: Some simple procedures , 2009, J. Soc. Struct..

[20]  Martina Morris,et al.  ergm: A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks. , 2008, Journal of statistical software.

[21]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[22]  Søren Feodor Nielsen,et al.  Inference and Missing Data: Asymptotic Results , 1997 .

[23]  Martina Morris,et al.  statnet: Software Tools for the Representation, Visualization, Analysis and Simulation of Network Data. , 2008, Journal of statistical software.

[24]  Tom A. B. Snijders,et al.  Markov Chain Monte Carlo Estimation of Exponential Random Graph Models , 2002, J. Soc. Struct..