Mapping Nominal Values to Numbers for Effective Visualization

Data sets with a large numbers of nominal variables, including some with large number of distinct values, are becoming increasingly common and need to be explored. Unfortunately, most existing visual exploration tools are designed to handle numeric variables only. When importing data sets with nominal values into such visualization tools, most solutions to date are rather simplistic. Often, techniques that map nominal values to numbers do not assign order or spacing among the values in a manner that conveys semantic relationships. Moreover, displays designed for nominal variables usually cannot handle high cardinality variables well. This paper addresses the problem of how to display nominal variables in general-purpose visual exploration tools designed for numeric variables. Specifically, we investigate (1) how to assign order and spacing among the nominal values, and (2) how to reduce the number of distinct values to display. We propose a new technique, called the Distance-Quantification-Classing (DQC) approach, to preprocess nominal variables before being imported into a visual exploration tool. In the Distance Step, we identify a set of independent dimensions that can be used to calculate the distance between nominal values. In the Quantification Step, we use the independent dimensions and the distance information to assign order and spacing among the nominal values. In the Classing Step, we use results from the previous steps to determine which values within the domain of a variable are similar to each other and thus can be grouped together. Each step in the DQC approach can be accomplished by a variety of techniques. We extended the XmdvTool package to incorporate this approach. We evaluated our approach on several data sets using a variety of measures.

[1]  Matthew O. Ward,et al.  XmdvTool: integrating multiple methods for visualizing multivariate data , 1994, Proceedings Visualization '94.

[2]  Jacqueline Meulman,et al.  SPSS Categories 10.0 , 2000 .

[3]  Michael Friendly,et al.  Visualizing Categorical Data , 2009, Encyclopedia of Database Systems.

[4]  Michel Tenenhaus,et al.  An analysis and synthesis of multiple correspondence analysis, optimal scaling, dual scaling, homogeneity analysis and other methods for quantifying categorical multivariate data , 1985 .

[5]  Tim Hesterberg Tail-Specific Linear Approximations for Efficient Bootstrap Simulations , 1995 .

[6]  Matthew O. Ward,et al.  Exploring N-dimensional databases , 1990, Proceedings of the First IEEE Conference on Visualization: Visualization `90.

[7]  Heike Hofmann,et al.  Interactive Graphics for Data Sets with Missing Values—MANET , 1996 .

[8]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[9]  Sheng Ma,et al.  Fast ordering of large categorical datasets for better visualization , 2001, KDD '01.

[10]  Michael Friendly,et al.  Visualizing categorical data in ViSta , 2003, Comput. Stat. Data Anal..

[11]  Matthew O. Ward,et al.  Mapping Nominal Values to Numbers for Effective Visualization , 2003, IEEE Symposium on Information Visualization 2003 (IEEE Cat. No.03TH8714).

[12]  Ben Shneiderman,et al.  Tree visualization with tree-maps: 2-d space-filling approach , 1992, TOGS.

[13]  John J. Bertin,et al.  The semiology of graphics , 1983 .

[14]  Charles E. Heckler,et al.  Applied Multivariate Statistical Analysis , 2005, Technometrics.

[15]  Roger Tourangeau,et al.  Cognition and Survey Research , 2000, Technometrics.

[16]  M. Greenacre Correspondence analysis in practice , 1993 .

[17]  R. Clarke,et al.  Theory and Applications of Correspondence Analysis , 1985 .

[18]  Daniele Micci-Barreca,et al.  A preprocessing scheme for high-cardinality categorical attributes in classification and prediction problems , 2001, SKDD.

[19]  F. Faulbaum SoftStat '93: Advances in Statistical Software 4. , 1995 .

[20]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[21]  Alfred Inselberg,et al.  Parallel coordinates for visualizing multi-dimensional geometry , 1987 .

[22]  Stefan Berchtold,et al.  Similarity clustering of dimensions for an enhanced visualization of multidimensional data , 1998, Proceedings IEEE Symposium on Information Visualization (Cat. No.98TB100258).

[23]  Michael Friendly,et al.  Effect ordering for data displays , 2003, Comput. Stat. Data Anal..

[24]  Richard A. Becker,et al.  The Visual Design and Control of Trellis Display , 1996 .

[25]  Jacques Bertin,et al.  Graphics and graphic information-processing , 1981 .

[26]  T. J. Watson,et al.  Ordering Categorical Data to Improve VisualizationSheng , 1999 .

[27]  Ramana Rao,et al.  The table lens: merging graphical and symbolic representations in an interactive focus + context visualization for tabular information , 1994, CHI '94.

[28]  Ben Shneiderman,et al.  Tree-maps: a space-filling approach to the visualization of hierarchical information structures , 1991, Proceeding Visualization '91.

[29]  Thierry Pun,et al.  Correspondence analysis and hierarchical indexing for content-based image retrieval , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[30]  Ramana Rao,et al.  Table lens as a tool for making sense of data , 1996, AVI '96.

[31]  Alfred Inselberg,et al.  Parallel coordinates: a tool for visualizing multi-dimensional geometry , 1990, Proceedings of the First IEEE Conference on Visualization: Visualization `90.