Bayes-based Non-Bayesian Inference on Finite Populations from Non-representative Samples: A Unified Approach * * Based on S. N. Roy Memorial Lecture in the symposium.

Abstract Classical inference on finite populations is based on probability samples drawn from the target population with predefined selection probabilities. The target population parameters are either descriptive statistics such as totals or proportions, or parameters of statistical models assumed to hold for the population values. Familiar examples of estimation of models include the estimation of income elasticities from household surveys, comparisons of pupils’ achievements from educational surveys, and the study of causal relationships between risk factors and disease prevalence from health surveys. Models are also routinely used to account for measurement errors and for small area estimation with small samples in at least some of the areas. In practice, the samples selected are often not representative of the finite populations from which they are taken. This is so because the sample selection probabilities might be correlated with the model target values, known as informative sampling, or that observations are missing because of not missing at random (NMAR) nonresponse. Sometimes, the samples are subject to mode effects resulting from the use of different answering methods for different sample units, and in more extreme cases, the samples are drawn from sub-populations such as in web-based surveys or in observational studies. The focus of this article is to discuss and illustrate how all these diverse scenarios can be handled in a unified manner by use of Bayes theorem. The use of Bayes theorem allows relating the model holding for the observed data with the model holding for the missing data and the model operating in the target population. I discuss different estimation procedures and review articles that illustrate their performance.

[1]  R. Sugden,et al.  Ignorable and informative designs in survey sampling inference , 1984 .

[2]  Danny Pfeffermann,et al.  Multi-level modelling under informative sampling , 2006 .

[3]  Danny Pfeffermann,et al.  Statistical inference under non-ignorable sampling and non-response. An empirical likelihood approach , 2015 .

[4]  Sunghee Lee Propensity score adjustment as a weighting scheme for volunteer panel web surveys , 2006 .

[5]  Shirley A. Star,et al.  AMERICAN ASSOCIATION FOR PUBLIC OPINION RESEARCH , 1980 .

[6]  Sanjay Chaudhuri,et al.  A CONDITIONAL EMPIRICAL LIKELIHOOD APPROACH TO COMBINE SAMPLING DESIGN AND POPULATION LEVEL INFORMATION , 2010 .

[7]  R. Little To Model or Not To Model? Competing Modes of Inference for Finite Population Sampling , 2004 .

[8]  Danny Pfeffermann,et al.  Modelling of complex survey data: Why model? Why is it a problem? How can we approach it? , 2011 .

[9]  Danny Pfeffermann,et al.  Fitting Generalized Linear Models under Informative Sampling , 2003 .

[10]  Danny Pfeffermann,et al.  Small Area Estimation , 2011, International Encyclopedia of Statistical Science.

[11]  Chris J. Skinner,et al.  Weighting in survey analysis under informative sampling , 2013 .

[12]  R. Little,et al.  Inference for the Population Total from Probability-Proportional-to-Size Samples Based on Predictions from a Penalized Spline Nonparametric Model , 2003 .

[13]  G. W. Hill,et al.  Analysis of survey data , 1996 .

[14]  Danny Pfeffermann,et al.  Prediction of finite population totals based on the sample distribution , 2004 .

[15]  James O. Berger,et al.  Semiparametric Bayesian Analysis of Selection Models , 2001 .

[16]  H. Goldstein,et al.  Weighting for unequal selection probabilities in multilevel models , 1998 .

[17]  Danny Pfeffermann,et al.  Imputation and estimation under nonignorable nonresponse in household surveys with missing covariate information , 2011 .

[18]  Danny Pfeffermann,et al.  New important developments in small area estimation , 2013, 1302.4907.

[19]  J. Robins,et al.  Analysis of semi-parametric regression models with non-ignorable non-response. , 1997, Statistics in medicine.

[20]  E. Nadaraya On Estimating Regression , 1964 .

[21]  M. Woodbury A missing information principle: theory and applications , 1972 .

[22]  Danny Pfeffermann,et al.  PARAMETRIC AND SEMI-PARAMETRIC ESTIMATION OF REGRESSION MODELS FITTED TO SURVEY DATA* , 2016 .

[23]  Douglas Rivers,et al.  Sampling for Web Surveys , 2007, Handbook of Web Surveys.

[24]  Phillip S. Kott Calibration Weighting: Combining Probability Samples and Linear Prediction Models , 2009 .

[25]  D. Binder On the variances of asymptotically normal estimators from complex surveys , 1983 .

[26]  Danny Pfeffermann,et al.  Are Private Schools Better Than Public Schools? Appraisal for Ireland by Methods for Observational Studies. , 2011, The annals of applied statistics.

[27]  Danny Pfeffermann,et al.  Inference under informative sampling , 2009 .

[28]  J. Michael Brick,et al.  Nonresponse and Weighting , 2009 .

[29]  Isabel Molina,et al.  Small Area Estimation: Rao/Small Area Estimation , 2005 .

[30]  Jean-François Beaumont,et al.  A new approach to weighting and inference in sample surveys , 2008 .

[31]  Chris J. Skinner,et al.  Analysis of complex surveys , 1991 .

[32]  D. Pfeffermann,et al.  Small-Area Estimation Under Informative Probability Sampling of Areas and Within the Selected Areas , 2007 .

[33]  Jun Shao,et al.  Estimation With Survey Data Under Nonignorable Nonresponse or Informative Sampling , 2002 .

[34]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[35]  Don A. Dillman,et al.  Survey Mode as a Source of Instability in Responses across Surveys , 2005 .

[36]  Michail Sverchkov A New Approach to Estimation of Response Probabilities when Missing Data are Not Missing at Random , 2008 .

[37]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[38]  D. Rubin,et al.  Reducing Bias in Observational Studies Using Subclassification on the Propensity Score , 1984 .

[39]  G. S. Watson,et al.  Smooth regression analysis , 1964 .

[40]  Danny Pfeffermann,et al.  PARAMETRIC DISTRIBUTIONS OF COMPLEX SURVEY DATA UNDER INFORMATIVE PROBABILITY SAMPLING , 1998 .

[41]  E. Leeuw,et al.  To mix or not to mix data collection modes in surveys. , 2005 .

[42]  Danny Pfeffermann,et al.  Methodological Issues and Challenges in the Production of Official Statistics 24th Annual Morris Hansen Lecture , 2015 .

[43]  D. Pfeffermann,et al.  Estimation of treatment effects in observational studies by recovering the assignment probabilities and the population model , 2007 .

[44]  M. Couper A REVIEW OF ISSUES AND APPROACHES , 2000 .

[45]  J. N. K. Rao,et al.  SMALL AREA ESTIMATION UNDER INFORMATIVE SAMPLING , 2010 .

[46]  D. Pfeffermann The Role of Sampling Weights when Modeling Survey Data , 1993 .

[47]  R. Little Models for Nonresponse in Sample Surveys , 1982 .

[48]  Carl-Erik Särndal,et al.  Model Assisted Survey Sampling , 1997 .