PARAMETRIC DISTRIBUTIONS OF COMPLEX SURVEY DATA UNDER INFORMATIVE PROBABILITY SAMPLING

The sample distribution is defined as the distribution of the sample mea- surements given the selected sample. Under informative sampling, this distribution is different from the corresponding population distribution, although for several examples the two distributions are shown to be in the same family and only differ in some or all the parameters. A general approach of approximating the marginal sample distribution for a given population distribution and first order sample se- lection probabilities is discussed and illustrated. Theoretical and simulation results indicate that under common sampling methods of selection with unequal proba- bilities, when the population measurements are independently drawn from some distribution (superpopulation), the sample measurements are asymptotically inde- pendent as the population size increases. This asymptotic independence combined with the approximation of the marginal sample distribution permits the use of stan- dard methods such as direct likelihood inference or residual analysis for inference on the population distribution. Survey data may be viewed as the outcome of two random processes: The process generating the values in the finite population, often referred to as the 'superpopulation model', and the process selecting the sample data from the finite population values, known as the 'sample selection mechanism'. Analytic inference from survey data relates to the superpopulation model, but when the sample selection probabilities are correlated with the values of the model response variables even after conditioning on auxiliary variables, the sampling mechanism becomes informative and the selection effects need to be accounted for in the inference process. In this article, we propose a general method of inference on the population distribution (model) under informative sampling that consists of approximating the parametric distribution of the sample measurements. The sample distribu- tion is defined as the distribution of measurements corresponding to the units in

[1]  R D Gill,et al.  Non-response models for the analysis of non-monotone ignorable missing data. , 1997, Statistics in medicine.

[2]  D. Pfeffermann The Role of Sampling Weights when Modeling Survey Data , 1993 .

[3]  R. Sugden,et al.  Ignorable and informative designs in survey sampling inference , 1984 .

[4]  J. Robins,et al.  Analysis of semi-parametric regression models with non-ignorable non-response. , 1997, Statistics in medicine.

[5]  R. Pyke,et al.  Logistic disease incidence models and case-control studies , 1979 .

[6]  Edward L. Korn,et al.  Examples of Differing Weighted and Unweighted Estimates from a Sample Survey , 1995 .

[7]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[8]  C. R. Rao,et al.  On discrete distributions arising out of methods of ascertainment , 1965 .

[9]  Arthur S. Goldberger,et al.  Linear regression after selection , 1981 .

[10]  R. Little Models for Nonresponse in Sample Surveys , 1982 .

[11]  Søren Feodor Nielsen,et al.  Inference and Missing Data: Asymptotic Results , 1997 .

[12]  D. Wasserman,et al.  To weight or not to weight ... that is the question. , 1989, Journal of occupational medicine. : official publication of the Industrial Medical Association.

[13]  J M Robins,et al.  Non-response models for the analysis of non-monotone non-ignorable missing data. , 1997, Statistics in medicine.

[14]  D. Pfeffermann,et al.  The use of sampling weights for survey data analysis , 1996, Statistical methods in medical research.

[15]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[16]  J. Hájek,et al.  Sampling from a finite population , 1982 .

[17]  C. R. Rao,et al.  Weighted distributions and size-biased sampling with applications to wildlife populations and human families , 1978 .

[18]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[19]  M. Chao A general purpose unequal probability sampling plan , 1982 .