Multi-Class Data Exploration Using Space Transformed Visualization Plots

Visualization of large datasets is computationally expensive. For this reason, enveloping methods have been used to visualize such datasets. Using enveloping methods, we visualize summary statistics of the data in the space transformed visualization (STV) plots, such as the traditional parallel coordinate plot (TPCP), instead of the actual data records. Existing enveloping methods, however, are limited only to the TPCP and they can also be misleading. This is because the parallel coordinates are parameter transformations and the summary statistics computed for the original data records are not preserved throughout the transformation to the parallel coordinates space. We propose enveloping methods that avoid this drawback and that can be applied not only to the TPCP but also to a family of STV plots such as the smooth parallel coordinate plot (SPCP) and the Andrews plot. We apply the proposed methods to the min–max, the quartiles, and the concentration interval envelopes (CIEs). These enveloping methods allow us to visually describe the geometry of given classes without the need of visualizing each single data record. These methods are effective for visualizing large datasets, as illustrated for real datasets, because they mitigate the cluttering effect in visualizing large-sized classes in the STV plots. Supplemental materials, including R-code, are available online to enable readers to reproduce the graphs in this article and/or apply the proposed methods to their own data.

[1]  Edward J. Wegman,et al.  High Dimensional Clustering Using Parallel Coordinates and the Grand Tour , 1997 .

[2]  Alfred Inselberg,et al.  The plane with parallel coordinates , 1985, The Visual Computer.

[3]  Matthias Schonlau Visualizing Categorical Data Arising in the Health Sciences Using Hammock Plots , 2003 .

[4]  Alfred Inselberg,et al.  Convexity algorithms in parallel coordinates , 1987, JACM.

[5]  Rida E. Moustafa,et al.  Multivariate Continuous Data — Parallel Coordinates , 2006 .

[6]  Rida E. Moustafa QGPCP: Quantized Generalized Parallel Coordinate Plots for Large Multivariate Data Visualization , 2009 .

[7]  Daniel Asimov,et al.  The grand tour: a tool for viewing multidimensional data , 1985 .

[8]  Almir Olivette Artero,et al.  Uncovering Clusters in Crowded Parallel Coordinates Visualizations , 2004 .

[9]  Matthew O. Ward,et al.  Hierarchical parallel coordinates for exploration of large datasets , 1999, Proceedings Visualization '99 (Cat. No.99CB37067).

[10]  Helwig Hauser,et al.  Parallel Sets: interactive exploration and visual analysis of categorical data , 2006, IEEE Transactions on Visualization and Computer Graphics.

[11]  Ramanathan Gnanadesikan,et al.  Methods for statistical data analysis of multivariate observations , 1977, A Wiley publication in applied statistics.

[12]  D. F. Andrews,et al.  PLOTS OF HIGH-DIMENSIONAL DATA , 1972 .

[13]  E. Wegman,et al.  Construction of line densities for parallel coordinate plots , 1992 .

[14]  Maurice d' Ocagne Coordonnées parallèles et axiales Méthode de Transformation géométrique et Procédé nouveau de Calcul graphique, déduits de la Considération des Coordonnées parallèles , 1885, Nature.

[15]  Haim Levkowitz,et al.  Uncovering Clusters in Crowded Parallel Coordinates Visualizations , 2004, IEEE Symposium on Information Visualization.

[16]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[17]  E. Wegman Hyperdimensional Data Analysis Using Parallel Coordinates , 1990 .

[18]  José Fernando Rodrigues,et al.  Frequency plot and relevance plot to enhance visual data exploration , 2003, 16th Brazilian Symposium on Computer Graphics and Image Processing (SIBGRAPI 2003).

[19]  John W. Tukey,et al.  PRIM-9: An Interactive Multi-dimensional Data Display and Analysis System , 1975, ACM Pacific.

[20]  Gennady Andrienko,et al.  Blending Aggregation and Selection: Adapting Parallel Coordinates for the Visualization of Large Datasets , 2005 .

[21]  Helwig Hauser,et al.  Outlier-Preserving Focus+Context Visualization in Parallel Coordinates , 2006, IEEE Transactions on Visualization and Computer Graphics.

[22]  Claus Weihs,et al.  OMEGA (Online Multivariate Exploratory Graphical Analysis): Routine Searching for Structure , 1990 .