Visual and semantic interpretability of projections of high dimensional data for classification tasks

A number of visual quality measures have been introduced in visual analytics literature in order to automatically select the best views of high dimensional data from a large number of candidate data projections. These methods generally concentrate on the interpretability of the visualization and pay little attention to the interpretability of the projection axes. In this paper, we argue that interpretability of the visualizations and the feature transformation functions are both crucial for visual exploration of high dimensional labeled data. We present a two-part user study to examine these two related but orthogonal aspects of interpretability. We first study how humans judge the quality of 2D scatterplots of various datasets with varying number of classes and provide comparisons with ten automated measures, including a number of visual quality measures and related measures from various machine learning fields. We then investigate how the user perception on interpretability of mathematical expressions relate to various automated measures of complexity that can be used to characterize data projection functions. We conclude with a discussion of how automated measures of visual and semantic interpretability of data projections can be used together for exploratory analysis in classification tasks.

[1]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Marcus A. Magnor,et al.  Combining automated analysis and visualization techniques for effective exploration of high-dimensional data , 2009, 2009 IEEE Symposium on Visual Analytics Science and Technology.

[3]  Richard Bellman,et al.  Adaptive Control Processes - A Guided Tour (Reprint from 1961) , 2015, Princeton Legacy Library.

[4]  Daniel A. Keim,et al.  Visual quality metrics and human perception: an initial study on 2D projections of large multidimensional data , 2010, AVI.

[5]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[6]  John P. Lewis,et al.  Eurographics/ Ieee-vgtc Symposium on Visualization 2009 Selecting Good Views of High-dimensional Data Using Class Consistency , 2022 .

[7]  John W. Tukey,et al.  A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[8]  Michalis Vazirgiannis,et al.  Cluster validity methods: part I , 2002, SGMD.

[9]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[10]  Eun-Kyung Lee,et al.  Projection Pursuit for Exploratory Supervised Classification , 2005 .

[11]  Ivan Bratko,et al.  VizRank: Data Visualization Guided by Machine Learning , 2006, Data Mining and Knowledge Discovery.

[12]  Trevor J. Hastie,et al.  Discriminative vs Informative Learning , 1997, KDD.

[13]  Eréndira Rendón,et al.  A comparison of internal and external cluster validation indexes , 2011 .

[14]  L. Hubert,et al.  Quadratic assignment as a general data analysis strategy. , 1976 .

[15]  Chakib Tadj,et al.  Complexity of Mathematical Expressions in Adaptive Multimodal Multimedia System Ensuring Access to Mathematics for Visually Impaired Users , 2008 .

[16]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .

[17]  Simon Morton,et al.  Interpretable projection pursuit , 1989 .

[18]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[19]  Kenneth R. Koedinger,et al.  Evaluation of multimodal input for entering mathematical equations on the computer , 2005, CHI Extended Abstracts.

[20]  Johann Gasteiger,et al.  Classification of multicomponent analytical data of olive oils using different neural networks , 1994 .

[21]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[22]  Christopher J. C. Burges,et al.  Geometric Methods for Feature Extraction and Dimensional Reduction , 2005 .