Visual Analytics of Missing Data in Epidemiological Cohort Studies

We introduce a visual analytics solution to analyze and treat missing values. Our solution is based on general approaches to handle missing values, but is fine-tuned to the problems in epidemiological cohort study data. The most severe missingness problem in these data is the considerable dropout rate in longitudinal studies that limits the power of statistical analysis and the validity of study findings. Our work is inspired by discussions with epidemiologists and tries to add visual components to their current statistics-based approaches. In this paper we provide a graphical user interface for exploration, imputation and checking the quality of imputations.

[1]  Maria Blettner,et al.  Missing Data in Epidemiologic Studies , 2005 .

[2]  L. Braitman,et al.  Applied Longitudinal Data Analysis for Epidemiology: A Practical Guide , 2004, Annals of Internal Medicine.

[3]  Heike Hofmann,et al.  Visually Exploring Missing Values in Multivariable Data Using a Graphical User Interface , 2015 .

[4]  M. Kenward,et al.  Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls , 2009, BMJ : British Medical Journal.

[5]  Bernhard Preim,et al.  Subpopulation Discovery and Validation in Epidemiological Data , 2017, EuroVA@EuroVis.

[6]  S Greenland,et al.  A critical look at methods for handling missing covariates in epidemiologic regression analyses. , 1995, American journal of epidemiology.

[7]  Zhiyuan Zhang,et al.  Iterative cohort analysis and exploration , 2015, Inf. Vis..

[8]  Peter Kampstra,et al.  Beanplot: A Boxplot Alternative for Visual Comparison of Distributions , 2008 .

[9]  Kai Lawonn,et al.  3D Regression Heat Map Analysis of Population Study Data , 2016, IEEE Transactions on Visualization and Computer Graphics.

[10]  Bernhard Preim,et al.  Visual Analytics of Image-Centric Cohort Studies in Epidemiology , 2015, Visualization in Medicine and Life Sciences III.

[11]  Catherine Plaisant,et al.  Visualizing Missing Data: Graph Interpretation User Study , 2005, INTERACT.

[12]  H. Völzke,et al.  Study of Health in Pomerania (SHIP) , 2012, Bundesgesundheitsblatt - Gesundheitsforschung - Gesundheitsschutz.

[13]  T. Stijnen,et al.  Review: a gentle introduction to imputation of missing values. , 2006, Journal of clinical epidemiology.

[14]  Patrick Royston,et al.  Multiple imputation using chained equations: Issues and guidance for practice , 2011, Statistics in medicine.

[15]  M. Templ,et al.  Visualization of missing values using the R-package VIM , 2008 .

[16]  Robert C. Glen,et al.  Visual analysis of missing data — To see what isn't there , 2014, 2014 IEEE Conference on Visual Analytics Science and Technology (VAST).

[17]  Jeffrey Heer,et al.  D³ Data-Driven Documents , 2011, IEEE Transactions on Visualization and Computer Graphics.

[18]  Gary King,et al.  Amelia II: A Program for Missing Data , 2011 .

[19]  J. Graham,et al.  How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory , 2007, Prevention Science.

[20]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[21]  J. Schafer,et al.  A comparison of inclusive and restrictive strategies in modern missing data procedures. , 2001, Psychological methods.