Reinventing the Contingency Wheel: Scalable Visual Analytics of Large Categorical Data

Contingency tables summarize the relations between categorical variables and arise in both scientific and business domains. Asymmetrically large two-way contingency tables pose a problem for common visualization methods. The Contingency Wheel has been recently proposed as an interactive visual method to explore and analyze such tables. However, the scalability and readability of this method are limited when dealing with large and dense tables. In this paper we present Contingency Wheel++, new visual analytics methods that overcome these major shortcomings: (1) regarding automated methods, a measure of association based on Pearson's residuals alleviates the bias of the raw residuals originally used, (2) regarding visualization methods, a frequency-based abstraction of the visual elements eliminates overlapping and makes analyzing both positive and negative associations possible, and (3) regarding the interactive exploration environment, a multi-level overview+detail interface enables exploring individual data items that are aggregated in the visualization or in the table using coordinated views. We illustrate the applicability of these new methods with a use case and show how they enable discovering and analyzing nontrivial patterns and associations in large categorical data.

[1]  Alan J. Dix,et al.  Statistical , 2018, The War of Words.

[2]  A. D. Gordon,et al.  Correspondence Analysis Handbook. , 1993 .

[3]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[4]  Kristin A. Cook,et al.  Illuminating the Path: The Research and Development Agenda for Visual Analytics , 2005 .

[5]  Jimmy Johansson,et al.  Interactive Quantification of Categorical Variables in Mixed Data Sets , 2008, 2008 12th International Conference Information Visualisation.

[6]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[7]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[8]  Jean-Daniel Fekete,et al.  Interactive information visualization of a million items , 2002, IEEE Symposium on Information Visualization, 2002. INFOVIS 2002..

[9]  A. Scott,et al.  The Analysis of Categorical Data from Complex Sample Surveys: Chi-Squared Tests for Goodness of Fit and Independence in Two-Way Tables , 1981 .

[10]  Thorsten Meinl,et al.  KNIME: The Konstanz Information Miner , 2007, GfKl.

[11]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[12]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[13]  Ben Shneiderman,et al.  Dynamic Query Tools for Time Series Data Sets: Timebox Widgets for Interactive Exploration , 2004, Inf. Vis..

[14]  Ben Shneiderman,et al.  Tree visualization with tree-maps: 2-d space-filling approach , 1992, TOGS.

[15]  D. W. Scott On optimal and data based histograms , 1979 .

[16]  J. O. Robinson The Psychology of Visual Illusion , 1972 .

[17]  Tulay Koru-Sengul,et al.  Graphics of Large Datasets: Visualizing a Million , 2007, Technometrics.

[18]  Harald Piringer,et al.  Exploring proportions: Comparative visualization of categorical data , 2011, 2011 IEEE Conference on Visual Analytics Science and Technology (VAST).

[19]  Nairanjana Dasgupta,et al.  Analyzing Categorical Data , 2004, Technometrics.

[20]  Jacques Bertin,et al.  Semiology of Graphics - Diagrams, Networks, Maps , 2010 .

[21]  Matthew O. Ward,et al.  Mapping Nominal Values to Numbers for Effective Visualization , 2003, IEEE Symposium on Information Visualization 2003 (IEEE Cat. No.03TH8714).

[22]  M. Wertheimer Laws of organization in perceptual forms. , 1938 .

[23]  J. A. Hartigan,et al.  Mosaics for Contingency Tables , 1981 .

[24]  José Fernando Rodrigues,et al.  Frequency plot and relevance plot to enhance visual data exploration , 2003, 16th Brazilian Symposium on Computer Graphics and Image Processing (SIBGRAPI 2003).

[25]  Matthew D. Cooper,et al.  Revealing Structure in Visualizations of Dense 2D and 3D Parallel Coordinates , 2006, Inf. Vis..

[26]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[27]  Daniel A. Keim,et al.  Visual Analytics: Scope and Challenges , 2008, Visual Data Mining.

[28]  Michael Friendly,et al.  Graphical methods for categorical data , 1992 .

[29]  Stephen Curial,et al.  Effectively visualizing large networks through sampling , 2005, VIS 05. IEEE Visualization, 2005..

[30]  M. Greenacre,et al.  Multiple Correspondence Analysis and Related Methods , 2006 .

[31]  Cynthia A. Brewer,et al.  ColorBrewer.org: An Online Tool for Selecting Colour Schemes for Maps , 2003 .

[32]  Robert L. Harris,et al.  Information Graphics: A Comprehensive Illustrated Reference , 1996 .

[33]  Helwig Hauser,et al.  Angular brushing of extended parallel coordinates , 2002, IEEE Symposium on Information Visualization, 2002. INFOVIS 2002..

[34]  Kurt Hornik,et al.  Visualizing Independence Using Extended A ssociation Plots , 2003 .

[35]  Helwig Hauser,et al.  Time histograms for large, time-dependent data , 2004, VISSYM'04.

[36]  R. Kosara,et al.  Parallel sets: visual analysis of categorical data , 2005, IEEE Symposium on Information Visualization, 2005. INFOVIS 2005..

[37]  Tengke Xiong,et al.  A New MCA-Based Divisive Hierarchical Algorithm for Clustering Categorical Data , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[38]  Silvia Miksch,et al.  Contingency Wheel: Visual Analysis of Large Contingency Tables , 2011, EuroVA@EuroVis.