Perception-Based Evaluation of Projection Methods for Multidimensional Data Visualization

Similarity-based layouts generated by multidimensional projections or other dimension reduction techniques are commonly used to visualize high-dimensional data. Many projection techniques have been recently proposed addressing different objectives and application domains. Nonetheless, very little is known about the effectiveness of the generated layouts from a user's perspective, how distinct layouts from the same data compare regarding the typical visualization tasks they support, or how domain-specific issues affect the outcome of the techniques. Learning more about projection usage is an important step towards both consolidating their role in high-dimensional data analysis and taking informed decisions when choosing techniques. This work provides a contribution towards this goal. We describe the results of an investigation on the performance of layouts generated by projection techniques as perceived by their users. We conducted a controlled user study to test against the following hypotheses: (1) projection performance is task-dependent; (2) certain projections perform better on certain types of tasks; (3) projection performance depends on the nature of the data; and (4) subjects prefer projections with good segregation capability. We generated layouts of high-dimensional data with five techniques representative of different projection approaches. As application domains we investigated image and document data. We identified eight typical tasks, three of them related to segregation capability of the projection, three related to projection precision, and two related to incurred visual cluttering. Answers to questions were compared for correctness against `ground truth' computed directly from the data. We also looked at subject confidence and task completion times. Statistical analysis of the collected data resulted in Hypotheses 1 and 3 being confirmed, Hypothesis 2 being confirmed partially and Hypotheses 4 could not be confirmed. We discuss our findings in comparison with some numerical measures of projection layout quality. Our results offer interesting insight on the use of projection layouts in data visualization tasks and provide a departing point for further systematic investigations.

[1]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[2]  C. Loader Local Likelihood Density Estimation , 1996 .

[3]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[4]  Claire Cardie,et al.  Clustering with Instance-Level Constraints , 2000, AAAI/IAAI.

[5]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[6]  Haim Levkowitz,et al.  Least Square Projection: A Fast High-Precision Multidimensional Projection Technique and Its Application to Document Mapping , 2008, IEEE Transactions on Visualization and Computer Graphics.

[7]  Michel Verleysen,et al.  Quality assessment of dimensionality reduction: Rank-based criteria , 2009, Neurocomputing.

[8]  Jarkko Venna,et al.  Information Retrieval Perspective to Nonlinear Dimensionality Reduction for Data Visualization , 2010, J. Mach. Learn. Res..

[9]  Joshua M. Lewis,et al.  A Behavioral Investigation of Dimensionality Reduction , 2012, CogSci.

[10]  Yoshua Bengio,et al.  Exploring Strategies for Training Deep Neural Networks , 2009, J. Mach. Learn. Res..

[11]  Ronald A. Rensink,et al.  The Visual Perception of Correlation in Scatterplots , 2010 .

[12]  Ronald A. Rensink,et al.  The Perception of Correlation in Scatterplots , 2010, Comput. Graph. Forum.

[13]  Peter Eades,et al.  A Heuristic for Graph Drawing , 1984 .

[14]  Stefanie Nowak,et al.  Using one-class SVM outliers detection for verification of collaboratively tagged image training sets , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[15]  Stephan R. Sain,et al.  Multi-dimensional Density Estimation , 2004 .

[16]  Joshua M. Lewis,et al.  Human Cluster Evaluation and Formal Quality Measures: A Comparative Study , 2012, CogSci.

[17]  Ronald A. Rensink The Rapid Perception of Correlation in Scatterplots , 2011 .

[18]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[19]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[20]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[21]  Dan Klein,et al.  From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering , 2002, ICML.

[22]  Tamara Munzner,et al.  A Taxonomy of Visual Cluster Separation Factors , 2012, Comput. Graph. Forum.

[23]  Rosane Minghim,et al.  Improved Similarity Trees and their Application to Visual Data Classification , 2011, IEEE Transactions on Visualization and Computer Graphics.

[24]  T. Munzner,et al.  Dimensionality Reduction in the Wild : Gaps and Guidance , 2012 .

[25]  Xin Geng,et al.  Supervised nonlinear dimensionality reduction for visualization and classification , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[26]  Jarkko Venna,et al.  Trustworthiness and metrics in visualizing similarity of gene expression , 2003, BMC Bioinformatics.

[27]  Enrico Bertini,et al.  Quality Metrics in High-Dimensional Data Visualization: An Overview and Systematization , 2011, IEEE Transactions on Visualization and Computer Graphics.

[28]  Gennady L. Andrienko,et al.  Exploratory analysis of spatial and temporal data - a systematic approach , 2005 .

[29]  I. T. Jolliffe,et al.  Springer series in statistics , 1986 .

[30]  Michaël Aupetit,et al.  Visualizing distortions and recovering topology in continuous projection techniques , 2007, Neurocomputing.

[31]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[32]  Rosane Minghim,et al.  Point Placement by Phylogenetic Trees and its Application to Visual Analysis of Document Collections , 2007, 2007 IEEE Symposium on Visual Analytics Science and Technology.

[33]  Tamara Munzner,et al.  Empirical Guidance on Scatterplot and Dimension Reduction Technique Choices , 2013, IEEE Transactions on Visualization and Computer Graphics.

[34]  Marc Olano,et al.  Glimmer: Multilevel MDS on the GPU , 2009, IEEE Transactions on Visualization and Computer Graphics.

[35]  Marcus A. Magnor,et al.  Combining automated analysis and visualization techniques for effective exploration of high-dimensional data , 2009, 2009 IEEE Symposium on Visual Analytics Science and Technology.

[36]  Michel Verleysen,et al.  Scale-independent quality criteria for dimensionality reduction , 2010, Pattern Recognit. Lett..

[37]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[38]  Marcus A. Magnor,et al.  Perception-based visual quality measures , 2011, 2011 IEEE Conference on Visual Analytics Science and Technology (VAST).

[39]  Daniel A. Keim,et al.  Visual quality metrics and human perception: an initial study on 2D projections of large multidimensional data , 2010, AVI.

[40]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[41]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[42]  F. E. Grubbs Procedures for Detecting Outlying Observations in Samples , 1969 .

[43]  John P. Lewis,et al.  Eurographics/ Ieee-vgtc Symposium on Visualization 2009 Selecting Good Views of High-dimensional Data Using Class Consistency , 2022 .

[44]  Michaël Aupetit,et al.  CheckViz: Sanity Check and Topological Clues for Linear and Non‐Linear Mappings , 2011, Comput. Graph. Forum.