To identify what is not there: A definition of missingness patterns and evaluation of missing value visualization

While missing data is a commonly occurring issue in many domains, it is a topic that has been greatly overlooked by visualization scientists. Missing data values reduce the reliability of analysis results. A range of methods exist to replace the missing values with estimated values, but their appropriateness often depend on the patterns of missingness. Increased understanding of the missingness patterns and the distribution of missing values in data may greatly improve reliability, as well as provide valuable insight into potential problems in data gathering and analyses processes, and better understanding of the data as a whole. Visualization methods have a unique possibility to support investigation and understanding of missingness patterns by making the missing values and their relationship to recorded values visible. This paper provides an overview of visualization of missing data values, and defines a set of three missingness patterns of relevance for understanding missingness in data. It also contributes a usability evaluation which compares visualization methods representing missing values and how well they help users identify missingness patterns. The results indicate differences in performance depending on the visualization method as well as missingness pattern. Recommendations for future design of missing data visualization is provided based on the outcome of the study.

[1]  Gary King,et al.  Amelia II: A Program for Missing Data , 2011 .

[2]  Matthew O. Ward,et al.  Exploratory Visualization of Multivariate Data with Variable Quality , 2006, 2006 IEEE Symposium On Visual Analytics Science And Technology.

[3]  Heike Hofmann,et al.  Interactive Graphics for Data Sets with Missing Values—MANET , 1996 .

[4]  Dominique Brodbeck,et al.  Research directions in data wrangling: Visualizations and transformations for usable and credible data , 2011, Inf. Vis..

[5]  Deborah F. Swayne,et al.  Missing Data in Interactive High-Dimensional Data Visualization , 1998 .

[6]  John Cavallo,et al.  Restorer: a visualization technique for handling missing data , 1994, Proceedings Visualization '94.

[7]  A. Unwin,et al.  MANET Extensions to Interactive Statistical Graphics for Missing Values , 1997 .

[8]  M. Blair,et al.  Video Game Telemetry as a Critical Tool in the Study of Complex Skill Learning , 2013, PloS one.

[9]  Investigating the missing data mechanism in quality of life outcomes: a comparison of approaches , 2009, Health and quality of life outcomes.

[10]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[11]  Catherine Plaisant,et al.  Visualizing Missing Data: Graph Interpretation User Study , 2005, INTERACT.

[12]  A. Gelman,et al.  Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box , 2011 .

[13]  Heidrun Schumann,et al.  A systematic view on data descriptors for the visual analysis of tabular data , 2017, Inf. Vis..

[14]  Robert C. Glen,et al.  Visual analysis of missing data — To see what isn't there , 2014, 2014 IEEE Conference on Visual Analytics Science and Technology (VAST).

[15]  Harald Piringer,et al.  Visplause: Visual Data Quality Assessment of Many Time Series Using Plausibility Checks , 2017, IEEE Transactions on Visualization and Computer Graphics.

[16]  Heike Hofmann,et al.  Visually Exploring Missing Values in Multivariable Data Using a Graphical User Interface , 2015 .

[17]  Penny Rheingans,et al.  Procedural annotation of uncertain information , 2000, Proceedings Visualization 2000. VIS 2000 (Cat. No.00CH37145).

[18]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[19]  A. Graziano,et al.  Research Methods: A Process of Inquiry , 1989 .

[20]  Peter Filzmoser,et al.  Exploring incomplete data using visualization techniques , 2012, Adv. Data Anal. Classif..

[21]  Margaret Varga,et al.  Black Holes, Keyholes And Brown Worms: Challenges In Sense Making , 2012 .

[22]  Duncan Temple Lang,et al.  GGobi: evolving from XGobi into an extensible framework for interactive data visualization , 2003, Comput. Stat. Data Anal..

[23]  Alex T. Pang,et al.  Visualizing Sparse Gridded Data Sets , 2000, IEEE Computer Graphics and Applications.

[24]  Shouhong Wang,et al.  Data Mining with Incomplete Data , 2009, Encyclopedia of Data Warehousing and Mining.

[25]  M. Templ,et al.  Visualization of missing values using the R-package VIM , 2008 .

[26]  Shouhong Wang,et al.  Visualization of the Critical Patterns of Missing Values in Classification Data , 2007, VISUAL.