Using Social Network Information for Survey Estimation

Abstract Model-based and model-assisted methods of survey estimation aim to improve the precision of estimators of the population total or mean relative to methods based on the nonparametric Horvitz-Thompson estimator. These methods often use a linear regression model defined in terms of auxiliary variables whose values are assumed known for all population units. Information on networks represents another form of auxiliary information that might increase the precision of these estimators, particularly if it is reasonable to assume that networked population units have similar values of the survey variable. Linear models that use networks as a source of auxiliary information include autocorrelation, disturbance, and contextual models. In this article we focus on social networks, and investigate how much of the population structure of the network needs to be known for estimation methods based on these models to be useful. In particular, we use simulation to compare the performance of the best linear unbiased predictor under a model that ignores the network with model-based estimators that incorporate network information. Our results show that incorporating network information via a contextual model seems to be the most appropriate approach. We also show that one does not need to know the full population network, but that knowledge of the partial network linking the sampled population units to the non-sampled population units is necessary. Finally, we also provide an estimator for the mean-squared error to make an informed decision about using the contextual information, as well as the results showing that this adaptive strategy leads to higher precision.

[1]  Roger Th. A. J. Leenders,et al.  Modeling social influence through network autocorrelation: constructing the weight matrix , 2002, Soc. Networks.

[2]  Garry Robins,et al.  Missing data in social networks: Problems and prospects for model-based inference , 2009 .

[3]  D. Bates,et al.  Linear Mixed-Effects Models using 'Eigen' and S4 , 2015 .

[4]  H. Goldstein Multilevel mixed linear model analysis using iterative generalized least squares , 1986 .

[5]  PATRICK DOREIAN,et al.  Network Autocorrelation Models , 1984 .

[6]  A. Zammit‐Mangion,et al.  Computational aspects of the EM algorithm for spatial econometric models with missing data , 2017 .

[7]  Martina Morris,et al.  ergm: A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks. , 2008, Journal of statistical software.

[8]  R. Royall The Linear Least-Squares Prediction Approach to Two-Stage Sampling , 1976 .

[9]  Paul D. McNicholas,et al.  Model-Based Clustering , 2016, Journal of Classification.

[10]  Ananda Sen,et al.  The Theory of Dispersion Models , 1997, Technometrics.

[11]  H. Goldstein Restricted unbiased iterative generalized least-squares estimation , 1989 .

[12]  Robert Chambers,et al.  An Introduction to Model-Based Survey Sampling with Applications , 2012 .

[13]  R. Chambers,et al.  Adaptive calibration for prediction of finite population totals , 2008 .

[14]  David R. Hunter,et al.  Curved exponential family models for social networks , 2007, Soc. Networks.

[15]  Carter T. Butts,et al.  network: A Package for Managing Relational Data in R , 2008 .

[16]  A. Raftery,et al.  Model‐based clustering for social networks , 2007 .

[17]  D. J. Strauss,et al.  Pseudolikelihood Estimation for Social Networks , 1990 .

[18]  N. Tzavidis,et al.  On Bias-Robust Mean Squared Error Estimation for Pseudo-Linear Small Area Estimators , 2009 .

[19]  Ove Frank,et al.  http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained , 2007 .

[20]  Luciano Rossoni,et al.  Models and methods in social network analysis , 2006 .

[21]  K. Ord Estimation Methods for Models of Spatial Interaction , 1975 .

[22]  James B. Duke Estimation of the Network Effects Model in a Large Data Set , 1993 .

[23]  P. Pattison,et al.  Conditional estimation of exponential random graph models from snowball sampling designs , 2013 .

[24]  T. Suesse Marginalized Exponential Random Graph Models , 2012 .

[25]  D. Hunter,et al.  Goodness of Fit of Social Network Models , 2008 .

[26]  Garry Robins,et al.  Analysing exponential random graph (p-star) models with missing data using Bayesian data augmentation , 2010 .

[27]  Tom A. B. Snijders,et al.  Markov Chain Monte Carlo Estimation of Exponential Random Graph Models , 2002, J. Soc. Struct..

[28]  Philippe Flajolet,et al.  Adaptive Sampling , 1997 .

[29]  Carl-Erik Särndal,et al.  Model Assisted Survey Sampling , 1997 .

[30]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[31]  D. Hunter,et al.  Inference in Curved Exponential Family Models for Networks , 2006 .

[32]  T. Suesse Estimation in autoregressive population models , 2012 .

[33]  Noah E. Friedkin,et al.  Social networks in structural equation models , 1990 .

[34]  P. D. Laat The collegial phenomenon. The social mechanisms of cooperation among peers in a corporate law partnership , 2003 .

[35]  P. Pattison,et al.  New Specifications for Exponential Random Graph Models , 2006 .

[36]  Noah E. Friedkin,et al.  Network Studies of Social Influence , 1993 .

[37]  Sw. Banerjee,et al.  Hierarchical Modeling and Analysis for Spatial Data , 2003 .