Review of inverse probability weighting for dealing with missing data

The simplest approach to dealing with missing data is to restrict the analysis to complete cases, i.e. individuals with no missing values. This can induce bias, however. Inverse probability weighting (IPW) is a commonly used method to correct this bias. It is also used to adjust for unequal sampling fractions in sample surveys. This article is a review of the use of IPW in epidemiological research. We describe how the bias in the complete-case analysis arises and how IPW can remove it. IPW is compared with multiple imputation (MI) and we explain why, despite MI generally being more efficient, IPW may sometimes be preferred. We discuss the choice of missingness model and methods such as weight truncation, weight stabilisation and augmented IPW. The use of IPW is illustrated on data from the 1958 British Birth Cohort.

[1]  M. Kenward,et al.  Analysis of Incomplete Data Using Inverse Probability Weighting and Doubly Robust Estimators , 2010 .

[2]  M. Davidian,et al.  Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data , 2009, Biometrika.

[3]  Paul T. von Hippel,et al.  HOW TO IMPUTE INTERACTIONS, SQUARES, AND OTHER TRANSFORMED VARIABLES , 2009 .

[4]  HANDLING MISSING DATA BY DELETING COMPLETELY OBSERVED RECORDS. , 2009, Journal of statistical planning and inference.

[5]  G. King,et al.  Bias in a binary risk behaviour model subject to inconsistent reports and dropout in a South African high school cohort study , 2009, Statistics in medicine.

[6]  J. Ballenger Childhood and Adulthood Psychological Ill Health as Predictors of Midlife Affective and Anxiety Disorders: The 1958 British Birth Cohort , 2009 .

[7]  Harvey Goldstein,et al.  Handling attrition and non-response in longitudinal data , 2009 .

[8]  Stephen R Cole,et al.  Constructing inverse probability weights for marginal structural models. , 2008, American journal of epidemiology.

[9]  Peter C Austin,et al.  The performance of different propensity-score methods for estimating relative risks. , 2008, Journal of clinical epidemiology.

[10]  S. Stansfeld,et al.  Lifecourse socioeconomic predictors of midlife drinking patterns, problems and abstention: findings from the 1958 British Birth Cohort Study. , 2008, Drug and alcohol dependence.

[11]  C. Hertzman,et al.  Cognitive development and cortisol patterns in mid-life: Findings from a British birth cohort , 2008, Psychoneuroendocrinology.

[12]  Daniel F. McCaffrey,et al.  Comment: Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data , 2008, 0804.2962.

[13]  S. Stansfeld,et al.  Psychosocial work characteristics and anxiety and depressive disorders in midlife: the effects of prior psychological distress , 2008, Occupational and Environmental Medicine.

[14]  D. Strachan,et al.  Loss and representativeness in a biomedical survey at age 45 years: 1958 British birth cohort , 2008, Journal of Epidemiology & Community Health.

[15]  S. Stansfeld,et al.  Childhood and adulthood socio-economic position and midlife depressive and anxiety disorders , 2008, British Journal of Psychiatry.

[16]  Geert Molenberghs,et al.  A simulation study comparing weighted estimating equations with multiple imputation based estimating equations for longitudinal binary data , 2008, Comput. Stat. Data Anal..

[17]  M. Hotopf,et al.  How many mailouts? Could attempts to increase the response rate in the Iraq war cohort study be counterproductive? , 2007, BMC medical research methodology.

[18]  J. Robins,et al.  Comment: Performance of Double-Robust Estimators When “Inverse Probability” Weights Are Highly Variable , 2007, 0804.2965.

[19]  Michael G Kenward,et al.  Multiple imputation: current perspectives , 2007, Statistical methods in medical research.

[20]  S. van Buuren Multiple imputation of discrete and continuous data by fully conditional specification , 2007, Statistical methods in medical research.

[21]  E. Hyppönen,et al.  Prenatal Exposures and Glucose Metabolism in Adulthood , 2007, Diabetes Care.

[22]  Ken P Kleinman,et al.  Much Ado About Nothing , 2007, The American statistician.

[23]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[24]  Joseph Kang,et al.  Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data , 2007, 0804.2973.

[25]  Andrew M. Jones,et al.  Health‐related non‐response in the British Household Panel Survey and European Community Household Panel: using inverse‐probability‐weighted estimators in non‐linear models , 2006 .

[26]  Kaare Christensen,et al.  Age trajectories of grip strength: cross-sectional and longitudinal data among 8,342 Danes aged 46 to 102. , 2006, Annals of epidemiology.

[27]  M. Kenward,et al.  A comparison of multiple imputation and doubly robust estimation for analyses with missing data , 2006 .

[28]  A. Tsiatis Semiparametric Theory and Missing Data , 2006 .

[29]  J. Avorn,et al.  Variable selection for propensity score models. , 2006, American journal of epidemiology.

[30]  Wayne A. Fuller,et al.  On the bias of the multiple‐imputation variance estimator in survey sampling , 2006 .

[31]  M. Seguí-Gómez,et al.  Predictors of follow-up and assessment of selection bias from dropouts using inverse probability weighting in a cohort of university graduates , 2006, European Journal of Epidemiology.

[32]  Mark J. van der Laan,et al.  A semiparametric model selection criterion with applications to the marginal structural model , 2006, Comput. Stat. Data Anal..

[33]  Abdullah Al Mamun,et al.  Early predictors of adult drinking: a birth cohort study. , 2005, American journal of epidemiology.

[34]  J. Robins,et al.  Doubly Robust Estimation in Missing Data and Causal Inference Models , 2005, Biometrics.

[35]  M. Boyle,et al.  Childhood and early adult predictors of risk of incident back pain: Ontario Child Health Study 2001 follow-up. , 2005, American journal of epidemiology.

[36]  Romain Neugebauer,et al.  An application of model-fitting procedures for marginal structural models. , 2005, American journal of epidemiology.

[37]  Chuanhai Liu Robit Regression: A Simple Robust Alternative to Logistic and Probit Regression , 2005 .

[38]  M. Höfler,et al.  The use of weights to account for non-response and drop-out , 2005, Social Psychiatry and Psychiatric Epidemiology.

[39]  A. Sigurdson,et al.  An application of a weighting method to adjust for nonresponse in standardized incidence ratio analysis of cohort studies. , 2005, Annals of Epidemiology.

[40]  M. Glickman,et al.  Use of Covariates and Survey Wave to Adjust for Nonresponse , 2004 .

[41]  P. Bebbington,et al.  Characteristics of teams, staff and patients: associations with outcomes of patients in assertive outreach , 2004, British Journal of Psychiatry.

[42]  Carole Dufouil,et al.  Analysis of longitudinal studies with death and drop‐out: a case study , 2004, Statistics in medicine.

[43]  T. Church,et al.  An epidemiological study of the magnitude and consequences of work related violence: the Minnesota Nurses’ Study , 2004, Occupational and Environmental Medicine.

[44]  T. Lancaster,et al.  Instrumental variables and inverse probability weighting for causal inference from longitudinal observational studies , 2004, Statistical methods in medical research.

[45]  L. Stefanski,et al.  The Calculus of M-Estimation , 2002 .

[46]  J. Schafer,et al.  A comparison of inclusive and restrictive strategies in modern missing data procedures. , 2001, Psychological methods.

[47]  J. Robins,et al.  Marginal Structural Models and Causal Inference in Epidemiology , 2000, Epidemiology.

[48]  J. Robins,et al.  Inference for imputation estimators , 2000 .

[49]  James M. Robins,et al.  Semiparametric Regression for Repeated Outcomes With Nonignorable Nonresponse , 1998 .

[50]  Myunghee C. Paik,et al.  The generalized estimating equation approach when data are not missing completely at random , 1997 .

[51]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[52]  L. Zhao,et al.  Weighted Semiparametric Estimation in Regression Analysis with Missing Covariate Data , 1997 .

[53]  Margaret S. Pepe,et al.  The relationship between hot-deck multiple imputation and weighted likelihood. , 1997, Statistics in medicine.

[54]  R D Gill,et al.  Non-response models for the analysis of non-monotone ignorable missing data. , 1997, Statistics in medicine.

[55]  D. Pfeffermann,et al.  The use of sampling weights for survey data analysis , 1996, Statistical methods in medical research.

[56]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[57]  J. Robins,et al.  Analysis of semiparametric regression models for repeated outcomes in the presence of missing data , 1995 .

[58]  D. Pfeffermann The Role of Sampling Weights when Modeling Survey Data , 1993 .

[59]  D. Hosmer,et al.  Applied Logistic Regression , 1991 .

[60]  David V. Hinkley,et al.  Transformation diagnostics for linear models , 1985 .

[61]  D. Rubin INFERENCE AND MISSING DATA , 1975 .