Survey Sampling
暂无分享,去创建一个
Survey sampling is one of the most commonly used data collection methods for social scientists. We begin by describing the simplest method of survey sampling, called simple random sampling. Suppose that we are conducting election polling and are interested in estimating the proportion of voters who support Obama in a battle ground state, say Florida. There are N voters in this state and we call this population of voters P . Thus, N denotes the population size. We assume that the complete list of N voters is available, allowing us to sample a subset of them from this list. Such a list is called a sampling frame. Before we begin our sampling, we determine the total number of voters we are going to interview. This number represents the sample size and is denoted by n. Simple random sampling refers to the procedure in which a researcher randomly samples n voters from the list of N voters with equal probability. There are two key characteristics of this procedure. First, we are sampling exactly n voters without replacement. That is, every voter gets sampled at most once. Secondly, each voter has an equal probability of being sampled. Clearly, this sampling probability is equal to n/N for every voter on the sampling frame. Below, we introduce two inferential approaches, one design-based and the other model-based, to survey sampling. In the simple case we are considering, both approaches give substantially similar estimation procedures. For example, from both perspectives, the sample proportion of Obama’s supporters in a poll is a good estimate of the population proportion. However, the two approaches are conceptually quite different. The basic idea of the design-based approach is that all statistical properties of one’s estimator are based solely on the actual data collection procedure employed by the researcher (here, simple random sampling). Thus, this approach faithfully follow the research design. In contrast, the model-based approach requires the researcher to specify a probability model. Such a model represents an approximation to the actual data generating process. As described in more detail below, each approach has its advantages and disadvantages.