Visual Analysis of Multidimensional Categorical Datasets

We present a set of interactive techniques for the visual analysis of multidimensional categorical data. Our approach is based on Multiple Correspondence Analysis (MCA), which allows one to analyze relationships, patterns, trends and outliers among dependent categorical variables. We use MCA as a dimensionality reduction technique to project both observations and their attributes in the same 2D space. We use a treeview to show attributes and their domains, a histogram of their representativity in the dataset, and as a compact overview of attribute-related facts. A second view shows both attributes and observations. We use a Voronoi diagram whose cells can be interactively merged to discover salient attributes, cluster values, and bin categories. Barchart legends help assigning meaning to the 2D view axes and 2D point clusters. We illustrate our techniques with real-world application data.

[1]  M. Friendly Mosaic Displays for Multi-Way Contingency Tables , 1994 .

[2]  H. Hirschfeld A Connection between Correlation and Contingency , 1935, Mathematical Proceedings of the Cambridge Philosophical Society.

[3]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[4]  J. Bertin La graphique et le traitement graphique de l'information , 1977 .

[5]  Steven K. Feiner,et al.  Worlds within worlds: metaphors for exploring n-dimensional virtual worlds , 1990, UIST '90.

[6]  Helwig Hauser,et al.  Parallel Sets: interactive exploration and visual analysis of categorical data , 2006, IEEE Transactions on Visualization and Computer Graphics.

[7]  Michael Friendly,et al.  Extending Mosaic Displays: Marginal, Conditional, and Partial Views of Categorical Data , 1999 .

[8]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[9]  Jimmy Johansson,et al.  A Task Based Performance Evaluation of Visualization Approaches for Categorical Data Analysis , 2011, 2011 15th International Conference on Information Visualisation.

[10]  H. Abdi,et al.  Principal component analysis , 2010 .

[11]  Arvid Lundervold,et al.  Representative Factor Generation for the Interactive Visual Analysis of High-Dimensional Data , 2012, IEEE Transactions on Visualization and Computer Graphics.

[12]  Augustin-Louis Cauchy,et al.  Sur l'équation à l'aide de laquelle on détermine les inégalités séculaires des mouvements des planètes , 2009 .

[13]  Neil Salkind Encyclopedia of Measurement and Statistics , 2006 .

[14]  Jimmy Johansson,et al.  Interactive Quantification of Categorical Variables in Mixed Data Sets , 2008, 2008 12th International Conference Information Visualisation.

[15]  Alexandru Telea,et al.  Visualization of Generalized Voronoi Diagrams , 2001, VisSym.

[16]  John T. Stasko,et al.  SellTrend: Inter-Attribute Visual Analysis of Temporal Transaction Data , 2009, IEEE Transactions on Visualization and Computer Graphics.

[17]  Ramana Rao,et al.  The table lens: merging graphical and symbolic representations in an interactive focus + context visualization for tabular information , 1994, CHI '94.

[18]  Neil Salkind,et al.  Encyclopedia of research design , 2010 .

[19]  Haim Levkowitz,et al.  Least Square Projection: A Fast High-Precision Multidimensional Projection Technique and Its Application to Document Mapping , 2008, IEEE Transactions on Visualization and Computer Graphics.

[20]  M. Greenacre Correspondence analysis in practice , 1993 .

[21]  Bernhard Preim,et al.  Interactive Visual Analysis of Perfusion Data , 2007, IEEE Transactions on Visualization and Computer Graphics.

[22]  Silvia Miksch,et al.  Contingency Wheel: Visual Analysis of Large Contingency Tables , 2011, EuroVA@EuroVis.

[23]  Michael Friendly,et al.  Visualizing Categorical Data: Data, Stories, and Pictures , 2000 .

[24]  Michel Tenenhaus,et al.  An analysis and synthesis of multiple correspondence analysis, optimal scaling, dual scaling, homogeneity analysis and other methods for quantifying categorical multivariate data , 1985 .

[25]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[26]  I K Fodor,et al.  A Survey of Dimension Reduction Techniques , 2002 .

[27]  Michael Friendly,et al.  Visualizing Categorical Data , 2009, Encyclopedia of Database Systems.

[28]  Ben Shneiderman,et al.  Tree-maps: a space-filling approach to the visualization of hierarchical information structures , 1991, Proceeding Visualization '91.

[29]  K. Gabriel,et al.  The biplot graphic display of matrices with application to principal component analysis , 1971 .

[30]  Alfred Inselberg,et al.  Multidimensional detective , 1997, Proceedings of VIZ '97: Visualization Conference, Information Visualization Symposium and Parallel Rendering Symposium.