Testing the missingness mechanism in longitudinal surveys: a case study using the Health and Retirement Study

ABSTRACT Imputation or likelihood-based approaches to handle missing data assume the data are missing completely at random (MCAR) or missing at random (MAR). However, little research has examined the missingness pattern before using these imputation/likelihood methods. Three missingness mechanisms – MCAR, MAR, and not missing at random (NMAR) – can be tested using information on research design, disciplinary knowledge, and appropriate methods. This study summarized six commonly used statistical methods to test the missingness mechanism and discussed their application conditions. We further applied these methods to a two-wave longitudinal dataset from the Health and Retirement Study (N = 18,747). Health measures met the MAR assumptions although we could not completely rule out NMAR. Demographic variables provided auxiliary information. The logistic regression method demonstrated applicability to a wide range of scenarios. This study provides a useful guide to choose methods to test missingness mechanisms depending on the research goal and nature of the data.

[1]  M. Shelley,et al.  Unmet Community Service Needs and Life Satisfaction Among Chinese Older Adults: A Longitudinal Study , 2021, Social work in public health.

[2]  M. Shelley,et al.  Retirement, Pensions, and Depressive Symptoms Among Older Adults in China, England, Mexico, and the United States , 2021, International journal of aging & human development.

[3]  C. Breunig Testing Missing at Random Using Instrumental Variables , 2019 .

[4]  Karen Bandeen-Roche,et al.  Robust Respondents and Lost Limitations: The Implications of Nonrandom Missingness for the Estimation of Health Trajectories , 2019, Journal of aging and health.

[5]  A. Berchtold Treatment and reporting of item-level missing data in social science research , 2019, International Journal of Social Research Methodology.

[6]  Tra My Pham,et al.  Missing data and multiple imputation in clinical epidemiological research , 2017, Clinical epidemiology.

[7]  S. Sterba Pattern Mixture Models for Quantifying Missing Data Uncertainty in Longitudinal Invariance Testing , 2017 .

[8]  Purna Mukhopadhyay,et al.  Multiple Imputation of Missing Data Using SAS , 2015 .

[9]  Rachel R. Stoiko The Health and Retirement Study , 2014, International journal of aging & human development.

[10]  Mortaza Jamshidian,et al.  MissMech: An R Package for Testing Homoscedasticity, Multivariate Normality, and Missing Completely at Random (MCAR) , 2014 .

[11]  Cheng Li,et al.  Little's Test of Missing Completely at Random , 2013 .

[12]  Eric S. Kim,et al.  Purpose in life and reduced incidence of stroke in older adults: 'The Health and Retirement Study'. , 2013, Journal of psychosomatic research.

[13]  T. Raykov,et al.  Examining the Missing Completely at Random Mechanism in Incomplete Data Sets: A Multiple Testing Approach , 2012 .

[14]  H Rhoads Christopher,et al.  Problems with Tests of the Missingness Mechanism in Quantitative Policy Studies , 2012 .

[15]  Tenko Raykov,et al.  On Testability of Missing Data Mechanisms in Incomplete Data Sets , 2011 .

[16]  M. Jamshidian,et al.  Tests of Homoscedasticity, Normality, and Missing Completely at Random for Incomplete Multivariate Data , 2010, Psychometrika.

[17]  Craig K. Enders,et al.  An introduction to modern missing data analyses. , 2010, Journal of school psychology.

[18]  M. Kenward,et al.  Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls , 2009, BMJ : British Medical Journal.

[19]  P. Fayers,et al.  Investigating the missing data mechanism in quality of life outcomes: a comparison of approaches , 2009, Health and quality of life outcomes.

[20]  C. Moinpour,et al.  Learning to live with missing quality-of-life data in advanced-stage disease trials. , 2005, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[21]  L. Braitman,et al.  Applied Longitudinal Data Analysis for Epidemiology: A Practical Guide , 2004, Annals of Internal Medicine.

[22]  Gary A. Ballinger,et al.  Using Generalized Estimating Equations for Longitudinal Data Analysis , 2004 .

[23]  W. Schill Applied Longitudinal Data Analysis for Epidemiology: A Practical Guide. , 2004 .

[24]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[25]  J. Listing,et al.  A Nonparametric Test for Random Dropouts , 2003 .

[26]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[27]  D. Bennett How can I deal with missing data in my study? , 2001, Australian and New Zealand journal of public health.

[28]  Roderick J. A. Little,et al.  A test of missing completely at random for generalised estimating equations with missing data , 1999 .

[29]  J. Listing,et al.  TESTS IF DROPOUTS ARE MISSED AT RANDOM , 1998 .

[30]  D. Fairclough,et al.  Why are missing quality of life data a problem in clinical trials of cancer therapy? , 1998, Statistics in medicine.

[31]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[32]  C. S. Davis,et al.  A test of the missing data mechanism for repeated categorical data. , 1993, Biometrics.

[33]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[34]  Douglas M. Hawkins,et al.  A new test for multivariate normality and homoscedasticity , 1981 .

[35]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[36]  J. Graham,et al.  Missing data analysis: making it work in the real world. , 2009, Annual review of psychology.

[37]  Roderick J. A. Little,et al.  Modeling the Drop-Out Mechanism in Repeated-Measures Studies , 1995 .

[38]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[39]  Christopher Winship,et al.  Models for Sample Selection Bias , 1992 .

[40]  P. Diggle,et al.  Testing for random dropouts in repeated measurement data. , 1989, Biometrics.

[41]  R. Little A Test of Missing Completely at Random for Multivariate Data with Missing Values , 1988 .