Quality-based guidance for exploratory dimensionality reduction

High-dimensional data sets containing hundreds of variables are difficult to explore, as traditional visualization methods often are unable to represent such data effectively. This is commonly addressed by employing dimensionality reduction prior to visualization. Numerous dimensionality reduction methods are available. However, few reduction approaches take the importance of several structures into account and few provide an overview of structures existing in the full high-dimensional data set. For exploratory analysis, as well as for many other tasks, several structures may be of interest. Exploration of the full high-dimensional data set without reduction may also be desirable. This paper presents flexible methods for exploratory analysis and interactive dimensionality reduction. Automated methods are employed to analyse the variables, using a range of quality metrics, providing one or more measures of ‘interestingness’ for individual variables. Through ranking, a single value of interestingness is obtained, based on several quality metrics, that is usable as a threshold for the most interesting variables. An interactive environment is presented in which the user is provided with many possibilities to explore and gain understanding of the high-dimensional data set. Guided by this, the analyst can explore the high-dimensional data set and interactively select a subset of the potentially most interesting variables, employing various methods for dimensionality reduction. The system is demonstrated through a use-case analysing data from a DNA sequence-based study of bacterial populations.

[1]  A. Karr,et al.  Visual Scalability , 2002 .

[2]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[3]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[4]  Richard A. Becker,et al.  Brushing scatterplots , 1987 .

[5]  E. Wegman Hyperdimensional Data Analysis Using Parallel Coordinates , 1990 .

[6]  Ingo Hotz,et al.  iPCA : An Interactive System for PCA-based Visual Analytics , 2008 .

[7]  Matthew O. Ward,et al.  Value and Relation Display for Interactive Exploration of High Dimensional Datasets , 2004, IEEE Symposium on Information Visualization.

[8]  Chaomei Chen,et al.  Top 10 Unsolved Information Visualization Problems , 2005, IEEE Computer Graphics and Applications.

[9]  Daniel A. Keim,et al.  Designing Pixel-Oriented Visualization Techniques: Theory and Applications , 2000, IEEE Trans. Vis. Comput. Graph..

[10]  Matthew O. Ward,et al.  Interactive hierarchical dimension ordering, spacing and filtering for exploration of high dimensional datasets , 2003, IEEE Symposium on Information Visualization 2003 (IEEE Cat. No.03TH8714).

[11]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[12]  Jos B. T. M. Roerdink,et al.  Visualizing High‐Dimensional Structures by Dimension Ordering and Filtering using Subspace Analysis , 2011, Comput. Graph. Forum.

[13]  Ramana Rao,et al.  The table lens: merging graphical and symbolic representations in an interactive focus + context visualization for tabular information , 1994, CHI '94.

[14]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[15]  Ben Shneiderman,et al.  A Rank-by-Feature Framework for Unsupervised Multidimensional Data Exploration Using Low Dimensional Projections , 2004, IEEE Symposium on Information Visualization.

[16]  Klaus Mueller,et al.  ClusterSculptor: A Visual Analytics Tool for High-Dimensional Data , 2007, 2007 IEEE Symposium on Visual Analytics Science and Technology.

[17]  John P. Lewis,et al.  Eurographics/ Ieee-vgtc Symposium on Visualization 2009 Selecting Good Views of High-dimensional Data Using Class Consistency , 2022 .

[18]  R. Scheaffer,et al.  Mathematical Statistics with Applications. , 1992 .

[19]  C. Fonseca,et al.  GENETIC ALGORITHMS FOR MULTI-OBJECTIVE OPTIMIZATION: FORMULATION, DISCUSSION, AND GENERALIZATION , 1993 .

[20]  Scott E. Maxwell,et al.  Introduction to Multivariate Analysis of Variance , 1985 .

[21]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[22]  John Fox,et al.  Applied Regression Analysis and Generalized Linear Models , 2008 .

[23]  Matthew O. Ward,et al.  Visual Hierarchical Dimension Reduction for Exploration of High Dimensional Datasets , 2003, VisSym.

[24]  Jerome L. Myers,et al.  Research Design and Statistical Analysis , 1991 .

[25]  Marcus A. Magnor,et al.  Improving the visual analysis of high-dimensional datasets using quality measures , 2010, 2010 IEEE Symposium on Visual Analytics Science and Technology.

[26]  Marcus A. Magnor,et al.  Automated Analytical Methods to Support Visual Exploration of High-Dimensional Data , 2011, IEEE Transactions on Visualization and Computer Graphics.

[27]  Yehuda Koren,et al.  Ieee Transactions on Visualization and Computer Graphics Robust Linear Dimensionality Reduction , 2022 .

[28]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[29]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[30]  Yujie Liu,et al.  Multivariate visual explanation for high dimensional datasets , 2008, 2008 IEEE Symposium on Visual Analytics Science and Technology.

[31]  A. J. Collins,et al.  Introduction To Multivariate Analysis , 1981 .

[32]  Eser Kandogan,et al.  Visualizing multi-dimensional clusters, trends, and outliers using star coordinates , 2001, KDD '01.

[33]  Peter Filzmoser,et al.  Brushing Dimensions - A Dual Visual Analysis Model for High-Dimensional Data , 2011, IEEE Transactions on Visualization and Computer Graphics.

[34]  Suzi Adams,et al.  Visual exploration of microbial populations , 2011, 2011 IEEE Symposium on Biological Data Visualization (BioVis)..

[35]  Diansheng Guo,et al.  Coordinating Computational and Visual Approaches for Interactive Feature Selection and Multivariate Clustering , 2003, Inf. Vis..

[36]  Alok N. Choudhary,et al.  Adaptive Grids for Clustering Massive Data Sets , 2001, SDM.

[37]  C. D. Kemp,et al.  Kendall's Advanced Theory of Statistics, Vol. 1: Distribution Theory. , 1995 .

[38]  J. Rodgers,et al.  Thirteen ways to look at the correlation coefficient , 1988 .

[39]  John W. Tukey,et al.  A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[40]  Daniel Engel,et al.  Structural Decomposition Trees , 2011, Comput. Graph. Forum.

[41]  Tobias Schreck,et al.  Techniques for Precision-Based Visual Analysis of Projected Data , 2010, Inf. Vis..

[42]  Tamara Munzner,et al.  DimStiller: Workflows for dimensional analysis and reduction , 2010, 2010 IEEE Symposium on Visual Analytics Science and Technology.

[43]  Alfred Inselberg,et al.  The plane with parallel coordinates , 1985, The Visual Computer.

[44]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[45]  Kalyanmoy Deb,et al.  Muiltiobjective Optimization Using Nondominated Sorting in Genetic Algorithms , 1994, Evolutionary Computation.

[46]  Haim Levkowitz,et al.  Enhanced High Dimensional Data Visualization through Dimension Reduction and Attribute Arrangement , 2006, Tenth International Conference on Information Visualisation (IV'06).

[47]  Daniel Asimov,et al.  The grand tour: a tool for viewing multidimensional data , 1985 .

[48]  Gordana Ivosev,et al.  Dimensionality reduction and visualization in principal component analysis. , 2008, Analytical chemistry.

[49]  Tamara Munzner,et al.  Steerable, Progressive Multidimensional Scaling , 2004, IEEE Symposium on Information Visualization.

[50]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[51]  M. Kendall,et al.  Kendall's advanced theory of statistics , 1995 .

[52]  S. Johansson,et al.  Interactive Dimensionality Reduction Through User-defined Combinations of Quality Metrics , 2009, IEEE Transactions on Visualization and Computer Graphics.