A Review, Framework and R toolkit for Exploring, Evaluating, and Comparing Visualizations

This paper gives a review and synthesis of methods of evaluating dimensionality reduction techniques. Particular attention is paid to rank-order neighborhood evaluation metrics. A framework is created for exploring dimensionality reduction quality through visualization. An associated toolkit is implemented in R. The toolkit includes scatter plots, heat maps, loess smoothing, and performance lift diagrams. The overall rationale is to help researchers compare dimensionality reduction techniques and use visual insights to help select and improve techniques. Examples are given for dimensionality reduction of manifolds and for the dimensionality reduction applied to a consumer survey dataset.

[1]  Rosane Minghim,et al.  Perception-Based Evaluation of Projection Methods for Multidimensional Data Visualization , 2015, IEEE Transactions on Visualization and Computer Graphics.

[2]  Rosane Minghim,et al.  Explaining Neighborhood Preservation for Multidimensional Projections , 2015, CGVC.

[3]  D. Steinley Properties of the Hubert-Arabie adjusted Rand index. , 2004, Psychological methods.

[4]  S. Shankar Sastry,et al.  Generalized principal component analysis (GPCA) , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  M. Wedel,et al.  Market Segmentation: Conceptual and Methodological Foundations , 1997 .

[6]  Tamara Munzner,et al.  Empirical Guidance on Scatterplot and Dimension Reduction Technique Choices , 2013, IEEE Transactions on Visualization and Computer Graphics.

[7]  Leland Wilkinson,et al.  ScagExplorer: Exploring Scatterplots by Their Scagnostics , 2014, 2014 IEEE Pacific Visualization Symposium.

[8]  Yunqian Ma,et al.  Manifold Learning Theory and Applications , 2011 .

[9]  Alexandru Telea,et al.  Skeleton-Based Scagnostics , 2018, IEEE Transactions on Visualization and Computer Graphics.

[10]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[11]  Jonathon Shlens,et al.  A Tutorial on Principal Component Analysis , 2014, ArXiv.

[12]  Martin Wattenberg,et al.  Artistic Data Visualization: Beyond Visual Analytics , 2007, HCI.

[13]  James Agutter,et al.  Transactions on Visualization and Computer Graphics Design Activity Framework for Visualization Design , 2014 .

[14]  Barbara Hammer,et al.  Visualizing the quality of dimensionality reduction , 2013, ESANN.

[15]  Michaël Aupetit,et al.  CheckViz: Sanity Check and Topological Clues for Linear and Non‐Linear Mappings , 2011, Comput. Graph. Forum.

[16]  Rong Jin,et al.  Distance Metric Learning: A Comprehensive Survey , 2006 .

[17]  John T. Stasko,et al.  Value-driven evaluation of visualizations , 2014, BELIV.

[18]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[19]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[20]  Deborah F. Swayne,et al.  Data Visualization With Multidimensional Scaling , 2008 .

[21]  William M. Shyu,et al.  Local Regression Models , 2017 .

[22]  Tamara Munzner,et al.  DimStiller: Workflows for dimensional analysis and reduction , 2010, 2010 IEEE Symposium on Visual Analytics Science and Technology.

[23]  Zening Qu,et al.  Evaluating Visualization Sets: Trade-offs Between Local Effectiveness and Global Consistency , 2016, BELIV '16.

[24]  Jarkko Venna,et al.  Local multidimensional scaling , 2006, Neural Networks.

[25]  J. Douglas Carroll,et al.  Development of an Agreement Metric Based Upon the RAND Index for the Evaluation of Dimensionality Reduction Techniques, with Applications to Mapping Customer Data , 2007, MLDM.

[26]  Michel Verleysen,et al.  Quality assessment of nonlinear dimensionality reduction based on K-ary neighborhoods , 2008, FSDM.

[27]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[28]  Robert L. Grossman,et al.  Graph-Theoretic Scagnostics , 2005, INFOVIS.

[29]  J. Carroll,et al.  Chapter 12 – MULTIDIMENSIONAL PERCEPTUAL MODELS AND MEASUREMENT METHODS , 1974 .

[30]  P. Groenen,et al.  The majorization approach to multidimensional scaling for Minkowski distances , 1995 .

[31]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[32]  Andreas Buja,et al.  Stress functions for nonlinear dimension reduction, proximity analysis, and graph drawing , 2013, J. Mach. Learn. Res..

[33]  J. Kruskal Nonmetric multidimensional scaling: A numerical method , 1964 .

[34]  Joseph L. Zinnes,et al.  Theory and Methods of Scaling. , 1958 .

[35]  Anshul Vikram Pandey,et al.  Towards Understanding Human Similarity Perception in the Analysis of Large Sets of Scatter Plots , 2016, CHI.

[36]  Melanie Tory,et al.  Rethinking Visualization: A High-Level Taxonomy , 2004 .

[37]  Stéphane Lafon,et al.  Diffusion maps , 2006 .

[38]  P. Laskowski,et al.  The Traditional and Modern Look at Tissot's Indicatrix , 1989 .

[39]  Robson Motta,et al.  Graph-based measures to assist user assessment of multimensional projections , 2015, Neurocomputing.

[40]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[41]  S. Shankar Sastry,et al.  Generalized Principal Component Analysis , 2016, Interdisciplinary applied mathematics.

[42]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[43]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[44]  Patrick Mair,et al.  Multidimensional Scaling Using Majorization: SMACOF in R , 2008 .

[45]  J. Douglas Carroll,et al.  PARAMAP vs. Isomap: A Comparison of Two Nonlinear Mapping Algorithms , 2006, J. Classif..

[46]  Vitruvius Pollio,et al.  Vitruvius, the Ten Books on Architecture , 2018 .

[47]  Daniel A. Keim,et al.  Visual Interaction with Dimensionality Reduction: A Structured Literature Analysis , 2017, IEEE Transactions on Visualization and Computer Graphics.

[48]  Michel Verleysen,et al.  Scale-independent quality criteria for dimensionality reduction , 2010, Pattern Recognit. Lett..

[49]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[50]  Michael Gertz,et al.  A Model and Framework for Visualization Exploration , 2007, IEEE Transactions on Visualization and Computer Graphics.

[51]  Daniel A. Keim,et al.  Visual quality metrics and human perception: an initial study on 2D projections of large multidimensional data , 2010, AVI.

[52]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[53]  Stephen L. France Properties of a General Measure of Configuration Agreement , 2013, Algorithms from and for Nature and Life.

[54]  Sanjoy Ghose,et al.  Marketing analytics: Methods, practice, implementation, and links to other fields , 2018, Expert Syst. Appl..

[55]  Michel Verleysen,et al.  Quality assessment of dimensionality reduction: Rank-based criteria , 2009, Neurocomputing.

[56]  Sanaz Mostaghim,et al.  Heatmap Visualization of Population Based Multi Objective Algorithms , 2007, EMO.

[57]  J. Leeuw Multidimensional Scaling Using Majorization : SMACOF in , 2008 .

[58]  Tamara Munzner,et al.  A Nested Model for Visualization Design and Validation , 2009, IEEE Transactions on Visualization and Computer Graphics.

[59]  Daniel M. Ringel,et al.  Understanding Competition Using Big Consumer Search Data , 2014, 2014 47th Hawaii International Conference on System Sciences.

[60]  Alexandru Telea,et al.  Explaining three-dimensional dimensionality reduction plots , 2016, Inf. Vis..

[61]  Daniel M. Ringel,et al.  Visualizing Asymmetric Competition Among More Than 1, 000 Products Using Big Search Data , 2016, Mark. Sci..

[62]  Enrico Bertini,et al.  Quality Metrics in High-Dimensional Data Visualization: An Overview and Systematization , 2011, IEEE Transactions on Visualization and Computer Graphics.

[63]  Andrew Vande Moere,et al.  On the role of design in information visualization , 2011, Inf. Vis..

[64]  L. Hubert,et al.  Comparing partitions , 1985 .

[65]  M. Sheelagh T. Carpendale,et al.  Evaluating Information Visualizations , 2008, Information Visualization.

[66]  Catherine Plaisant,et al.  The challenge of information visualization evaluation , 2004, AVI.

[67]  Bo Zhang,et al.  A new embedding quality assessment method for manifold learning , 2012, Neurocomputing.

[68]  Chris North,et al.  Toward measuring visualization insight , 2006, IEEE Computer Graphics and Applications.

[69]  David H. Laidlaw,et al.  The application visualization system: a computational environment for scientific visualization , 1989, IEEE Computer Graphics and Applications.

[70]  Joshua B. Tenenbaum,et al.  Global Versus Local Methods in Nonlinear Dimensionality Reduction , 2002, NIPS.

[71]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[72]  Leland Wilkinson,et al.  Transforming Scagnostics to Reveal Hidden Features , 2014, IEEE Transactions on Visualization and Computer Graphics.

[73]  Zehdreh Allen-Lafayette,et al.  Flattening the Earth, Two Thousand Years of Map Projections , 1998 .

[74]  Boris Müller,et al.  Probing Projections: Interaction Techniques for Interpreting Arrangements and Errors of Dimensionality Reductions , 2016, IEEE Transactions on Visualization and Computer Graphics.

[75]  Wolfgang Kienreich,et al.  Stress Maps: Analysing Local Phenomena in Dimensionality Reduction Based Visualisations , 2010, EuroVAST@EuroVis.

[76]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[77]  Rosane Minghim,et al.  Visual analysis of dimensionality reduction quality for parameterized projections , 2014, Comput. Graph..

[78]  Kurt Hornik,et al.  kernlab - An S4 Package for Kernel Methods in R , 2004 .

[79]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[80]  J. W. Hutchinson,et al.  Nearest neighbor analysis of psychological spaces. , 1986 .

[81]  Haim Levkowitz,et al.  Projection inspector: Assessment and synthesis of multidimensional projections , 2015, Neurocomputing.

[82]  Michael Biehl,et al.  How to evaluate Dimensionality Reduction (technical report) , 2011 .

[83]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[84]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[85]  Jarkko Venna,et al.  Trustworthiness and metrics in visualizing similarity of gene expression , 2003, BMC Bioinformatics.

[86]  Jarkko Venna,et al.  Neighborhood Preservation in Nonlinear Projection Methods: An Experimental Study , 2001, ICANN.

[87]  Stefan Holzer,et al.  VisCoDeR: A tool for visually comparing dimensionality reduction algorithms , 2018, ESANN.

[88]  J. Douglas Carroll,et al.  Two-Way Multidimensional Scaling: A Review , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[89]  A. Tversky,et al.  Nearest neighbor analysis of point processes: Applications to multidimensional scaling , 1983 .

[90]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[91]  A. Sobczyk Projections in Minkowski and Banach spaces , 1941 .

[92]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[93]  Michaël Aupetit,et al.  Visualizing distortions and recovering topology in continuous projection techniques , 2007, Neurocomputing.

[94]  A. Buja,et al.  Local Multidimensional Scaling for Nonlinear Dimension Reduction, Graph Drawing, and Proximity Analysis , 2009 .