Recent Developments in Dealing with Item Non‐response in Surveys: A Critical Review

The most common way for treating item non‐response in surveys is to construct one or more replacement values to fill in for a missing value. This process is known as imputation. We distinguish single from multiple imputation. Single imputation consists of replacing a missing value by a single replacement value, whereas multiple imputation uses two or more replacement values. This article reviews various imputation procedures used in National Statistical Offices as well as the properties of point and variance estimators in the presence of imputed survey data. It also provides the reader with newer developments in the field.

[1]  Roderick J A Little,et al.  A Review of Hot Deck Imputation for Survey Non‐response , 2010, International statistical review = Revue internationale de statistique.

[2]  R. Little Missing-Data Adjustments in Large Surveys , 1988 .

[3]  Danielle Sullivan,et al.  A hot deck imputation procedure for multiply imputing nonignorable missing data: The proxy pattern-mixture hot deck , 2015, Comput. Stat. Data Anal..

[4]  Wayne A. Fuller,et al.  Hot Deck Imputation for the Response Model , 2005 .

[5]  Kwun Chuen Gary Chan,et al.  Oracle, Multiple Robust and Multipurpose Calibration in a Missing Response Problem , 2014, 1410.3958.

[6]  Jean-François Beaumont,et al.  Calibrated imputation in surveys under a quasi‐model‐assisted approach , 2005 .

[7]  Sixia Chen,et al.  Jackknife empirical likelihood inference with regression imputation and survey data , 2014, J. Multivar. Anal..

[8]  Ofer Harel,et al.  Addressing Missing Data Mechanism Uncertainty using Multiple-Model Multiple Imputation: Application to a Longitudinal Clinical Trial. , 2012, The annals of applied statistics.

[9]  G. Yin,et al.  Ensemble Approaches to Estimating the Population Mean with Missing Response , 2017 .

[10]  Dan Jackson,et al.  What Is Meant by "Missing at Random"? , 2013, 1306.2812.

[11]  R. Little Survey Nonresponse Adjustments for Estimates of Means , 1986 .

[12]  Zizhong Fan,et al.  Preservation of Skip Patterns and Covariance Structure through Semi-Parametric Whole-Questionnaire Imputation , 2007 .

[13]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[14]  David Haziza,et al.  Doubly robust inference with missing data in survey sampling , 2014 .

[15]  Xiao-Li Meng,et al.  Multiple-Imputation Inferences with Uncongenial Sources of Input , 1994 .

[16]  David Haziza,et al.  On the Construction of Imputation Classes in Surveys , 2007 .

[17]  Guillaume Chauvet,et al.  Exact balanced random imputation for sample survey data , 2016, Comput. Stat. Data Anal..

[18]  J. N. K. Rao,et al.  A unified approach to linearization variance estimation from survey data after imputation for item nonresponse , 2009 .

[19]  Jun Shao,et al.  Balanced Repeated Replication for Stratified Multistage Survey Data under Imputation , 1998 .

[20]  Jae Kwang Kim,et al.  Fractional Imputation in Survey Sampling: A Comparative Review , 2015, 1508.06945.

[21]  Sixia Chen,et al.  Multiply robust imputation procedures for the treatment of item nonresponse in surveys , 2017 .

[22]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[23]  J. Rao,et al.  Variance Estimation in Two-stage Cluster Sampling under Imputation for Missing Data , 2010 .

[24]  Jared S. Murray,et al.  Multiple Imputation: A Review of Practical and Theoretical Findings , 2018, 1801.04058.

[25]  Wayne A. Fuller,et al.  Fractional hot deck imputation , 2004 .

[26]  Jae Kwang Kim Parametric fractional imputation for missing data analysis , 2011 .

[27]  N. Schenker,et al.  A Note on the Effect of Data Clustering on the Multiple-Imputation Variance Estimator: A Theoretical Addendum to the Lewis et al. article in JOS 2014 , 2016, Journal of official statistics.

[28]  J. Carpenter,et al.  Practice of Epidemiology Comparison of Random Forest and Parametric Imputation Models for Imputing Missing Data Using MICE: A CALIBER Study , 2014 .

[29]  Michael R Elliott,et al.  Multiple Imputation in Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap. , 2016, Journal of survey statistics and methodology.

[30]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[31]  David Haziza,et al.  Inference for domains under imputation for missing survey data , 2005 .

[32]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[33]  Chris J. Skinner,et al.  Jackknife variance estimation for multivariate statistics under hot-deck imputation from common donors , 2002 .

[34]  Fei Tang,et al.  Random forest missing data algorithms , 2017, Stat. Anal. Data Min..

[35]  Jae Kwang Kim,et al.  Nearest Neighbor Imputation for General Parameter Estimation in Survey Sampling , 2017, Advances in Econometrics.

[36]  Dong Wang,et al.  EMPIRICAL LIKELIHOOD FOR ESTIMATING EQUATIONS WITH MISSING VALUES , 2009, 0903.0726.

[37]  Ying Yuan,et al.  Parametric and Semiparametric Model‐Based Estimates of the Finite Population Mean for Two‐Stage Cluster Samples with Item Nonresponse , 2007, Biometrics.

[38]  Jean D. Opsomer,et al.  Model-Assisted Survey Estimation with Modern Prediction Techniques , 2017 .

[39]  H. Y. Chen Compatibility of conditionally specified models. , 2010, Statistics & probability letters.

[40]  G. V. Kass An Exploratory Technique for Investigating Large Quantities of Categorical Data , 1980 .

[41]  Wayne A. Fuller,et al.  On the bias of the multiple‐imputation variance estimator in survey sampling , 2006 .

[42]  Chris J. Skinner,et al.  Imputation under Informative Sampling , 2016 .

[43]  Lu Wang,et al.  Estimation with missing data: beyond double robustness , 2013 .

[44]  J. Deville,et al.  Efficient balanced sampling: The cube method , 2004 .

[45]  Changbao Wu,et al.  Calibration Weighting Methods for Complex Surveys , 2016 .

[46]  D. Haziza,et al.  Multiply robust imputation procedures for zero-inflated distributions in surveys , 2017, Metron.

[47]  Xiao-Hua Zhou,et al.  Multiple imputation: review of theory, implementation and software , 2007, Statistics in medicine.

[48]  J. Deville,et al.  On balanced random imputation in surveys , 2011 .

[49]  Jerome P. Reiter,et al.  Multiple imputation for missing data via sequential regression trees. , 2010, American journal of epidemiology.

[50]  Philip E. Cheng,et al.  Nonparametric Estimation of Mean Functionals with Data Missing at Random , 1994 .

[51]  D. Rubin,et al.  Multiple Imputation for Interval Estimation from Simple Random Samples with Ignorable Nonresponse , 1986 .

[52]  Xiao-Li Meng,et al.  Discussion: Efficiency and Self‐efficiency With Multiple Imputation Inference , 2003 .

[53]  J. N. K. Rao,et al.  Empirical likelihood-based inference under imputation for missing response data , 2002 .

[54]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[55]  D. Haziza,et al.  Doubly Robust Inference for the Distribution Function in the Presence of Missing Survey Data , 2016 .

[56]  Jianqiang C. Wang,et al.  On asymptotic normality and variance estimation for nondifferentiable survey estimators , 2011 .

[57]  Shu Yang,et al.  Fractional hot deck imputation for robust inference under item nonresponse in survey sampling , 2014 .

[58]  T. Raghunathan,et al.  Multiple Imputation of Missing Income Data in the National Health Interview Survey , 2006 .

[59]  J. Beaumont,et al.  Variance estimation when donor imputation is used to fill in missing values , 2009 .

[60]  Fully efficient estimation of coefficients of correlation in the presence of imputed survey data , 2012 .

[61]  J. Shao,et al.  Variance Estimation for Survey Data with Composite Imputation and Nonnegligible Sampling Fractions , 1999 .

[62]  Hansheng Wang,et al.  Sample Correlation Coefficients Based on Survey Data Under Regression Imputation , 2002 .

[63]  Jae Kwang Kim,et al.  Calibration Estimation in Survey Sampling , 2010 .

[64]  Qi Long,et al.  Doubly Robust Nonparametric Multiple Imputation for Ignorable Missing Data. , 2012, Statistica Sinica.

[65]  R. Clark,et al.  Imputation of Household Survey Data Using Linear Mixed Models , 2015 .

[66]  D. Haziza,et al.  Doubly robust imputation procedures for finite population means in the presence of a large number of zeros , 2014 .

[67]  Peisong Han,et al.  A further study of the multiply robust estimator in missing data analysis , 2014 .

[68]  Graham Kalton,et al.  Some efficient random imputation methods , 1984 .

[69]  Peisong Han,et al.  Multiply Robust Estimation in Regression Analysis With Missing Data , 2014 .

[70]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[71]  David Haziza,et al.  ON VARIANCE ESTIMATION UNDER AUXILIARY VALUE IMPUTATION IN SAMPLE SURVEYS , 2011 .

[72]  S. Yang,et al.  A note on multiple imputation under complex sampling , 2017 .

[73]  Jerome P. Reiter,et al.  The importance of modeling the sampling design in multiple imputation for missing data , 2006 .

[74]  J. Shao,et al.  Jackknife variance estimation with survey data under hot deck imputation , 1992 .

[75]  D. Haziza,et al.  Construction of Weights in Surveys: A Review , 2017 .

[76]  Radu V. Craiu,et al.  Nonparametric imputation method for nonresponse in surveys , 2016, Statistical Methods & Applications.