Models and Methods in Social Network Analysis: Network Sampling and Model Fitting

Introduction Survey methodology has a tradition in statistics of focusing on populations and samples. Samples of population units are selected according to probabilistic sampling designs. By controlling the design, selection bias and uncertainty of estimators and tests can be quantified so inference can be drawn with confidence. Early publications in the field were dedicated to explaining the benefits of probability sampling designs as opposed to convenience sampling of various sorts. Probability sampling is the term usually used when the selection probabilities are known for all samples and each population unit has a nonzero probability of being selected. The focus on controlled randomization can be contrasted with probabilistic uncertainty modeling. In many surveys, sampling variation is not the main source of uncertainty. There is variation due to measurement errors, response imperfections, observation difficulties, and other repetitive factors that can be specified by probabilistic assumptions. The superpopulation concept can also be seen as a way to include probabilistic modeling for such uncertainty that is not a consequence of imposed randomization or variation due to repetitive incidents. Modern statistical survey methodology distinguishes between design- and model-based approaches, and often uses an intermediate approach with model-assisted techniques in combination with design-based inference. A pure probabilistic model approach focuses on data and tries to imitate how data are generated. A good model fit is important for reliable inference, but does not necessarily mean that the sampling design is an explicit part of the model's data generating mechanism.