Identifying patterns of item missing survey data using latent groups: an observational study

Objectives To examine whether respondents to a survey of health and physical activity and potential determinants could be grouped according to the questions they missed, known as ‘item missing’. Design Observational study of longitudinal data. Setting Residents of Brisbane, Australia. Participants 6901 people aged 40–65 years in 2007. Materials and methods We used a latent class model with a mixture of multinomial distributions and chose the number of classes using the Bayesian information criterion. We used logistic regression to examine if participants’ characteristics were associated with their modal latent class. We used logistic regression to examine whether the amount of item missing in a survey predicted wave missing in the following survey. Results Four per cent of participants missed almost one-fifth of the questions, and this group missed more questions in the middle of the survey. Eighty-three per cent of participants completed almost every question, but had a relatively high missing probability for a question on sleep time, a question which had an inconsistent presentation compared with the rest of the survey. Participants who completed almost every question were generally younger and more educated. Participants who completed more questions were less likely to miss the next longitudinal wave. Conclusions Examining patterns in item missing data has improved our understanding of how missing data were generated and has informed future survey design to help reduce missing data.

[1]  Gérard Govaert,et al.  Rmixmod: The R Package of the Model-Based Unsupervised, Supervised and Semi-Supervised Classification Mixmod Library , 2015 .

[2]  C. Glas,et al.  Nonignorable data in IRT models: Polytomous responses and response propensity models with covariates , 2015 .

[3]  Nicholas J. Tierney,et al.  Using decision trees to understand structure in missing data , 2015, BMJ Open.

[4]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[5]  Charles Bouveyron,et al.  Model-based clustering of high-dimensional data: A review , 2014, Comput. Stat. Data Anal..

[6]  David J. Lunn,et al.  The BUGS Book: A Practical Introduction to Bayesian Analysis , 2013 .

[7]  Raymond J Carroll,et al.  Intake_epis_food(): An R Function for Fitting a Bivariate Nonlinear Measurement Error Model to Estimate Usual and Energy Intake for Episodically Consumed Foods. , 2012, Journal of statistical software.

[8]  C. Bouveyron,et al.  HDclassif: an R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data , 2012 .

[9]  A. Gelman,et al.  Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box , 2011 .

[10]  Yena Song,et al.  Effect of questionnaire length, personalisation and reminder type on response rate to a complex postal survey: randomised controlled trial , 2011, BMC medical research methodology.

[11]  John B. Carlin,et al.  Bias and efficiency of multiple imputation compared with complete‐case analysis for missing covariate values , 2010, Statistics in medicine.

[12]  M. Davier,et al.  Modeling Nonignorable Missing Data with Item Response Theory (IRT). Research Report. ETS RR-10-11. , 2010 .

[13]  C. Minder,et al.  Multivariate modelling of responses to conditional items: New possibilities for latent class analysis , 2009, Statistics in medicine.

[14]  Michele Haynes,et al.  HABITAT: A longitudinal multilevel study of physical activity change in mid-aged adults , 2009, BMC public health.

[15]  M R Petersen,et al.  Approaches for estimating prevalence ratios , 2008, Occupational and Environmental Medicine.

[16]  Xiao-Hua Zhou,et al.  Multiple imputation: review of theory, implementation and software , 2007, Statistics in medicine.

[17]  Ajay Jasra,et al.  Markov Chain Monte Carlo Methods and the Label Switching Problem in Bayesian Mixture Modeling , 2005 .

[18]  Gavin Turrell,et al.  Item Nonresponse in a Population-Based Mail Survey of Physical Activity , 2004 .

[19]  Charles E McCulloch,et al.  Latent Pattern Mixture Models for Informative Intermittent Missing Data in Longitudinal Studies , 2004, Biometrics.

[20]  Joseph W Hogan,et al.  Handling drop‐out in longitudinal studies , 2004, Statistics in medicine.

[21]  David R. Anderson,et al.  Model Selection and Inference: A Practical Information-Theoretic Approach , 2001 .

[22]  Edith D. de Leeuw,et al.  Reducing missing data in surveys: an overview of methods , 2001 .

[23]  P Royston,et al.  The use of fractional polynomials to model continuous risk variables in epidemiology. , 1999, International journal of epidemiology.

[24]  G Molenberghs,et al.  Identifying the types of missingness in quality of life data from clinical trials. , 1998, Statistics in medicine.

[25]  Donald Hedeker,et al.  Application of random-efiects pattern-mixture models for miss-ing data in longitudinal studies , 1997 .

[26]  Graham Kalton,et al.  Compensating for missing survey data , 1982 .

[27]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[28]  Colm O'Muircheartaigh,et al.  Symmetric pattern models: a latent variable approach to item non‐response in attitude scales , 1999 .