A Structure-Based Distance Metric for High-Dimensional Space Exploration with Multidimensional Scaling

Although the euclidean distance does well in measuring data distances within high-dimensional clusters, it does poorly when it comes to gauging intercluster distances. This significantly impacts the quality of global, low-dimensional space embedding procedures such as the popular multidimensional scaling (MDS) where one can often observe nonintuitive layouts. We were inspired by the perceptual processes evoked in the method of parallel coordinates which enables users to visually aggregate the data by the patterns the polylines exhibit across the dimension axes. We call the path of such a polyline its structure and suggest a metric that captures this structure directly in high-dimensional space. This allows us to better gauge the distances of spatially distant data constellations and so achieve data aggregations in MDS plots that are more cognizant of existing high-dimensional structure similarities. Our biscale framework distinguishes far-distances from near-distances. The coarser scale uses the structural similarity metric to separate data aggregates obtained by prior classification or clustering, while the finer scale employs the appropriate euclidean distance.

[1]  Jaegul Choo,et al.  iVisClassifier: An interactive visual analytics system for classification based on supervised dimension reduction , 2010, 2010 IEEE Symposium on Visual Analytics Science and Technology.

[2]  A. Bovik,et al.  A universal image quality index , 2002, IEEE Signal Processing Letters.

[3]  Ulrik Brandes,et al.  Eigensolver Methods for Progressive Multidimensional Scaling of Large Data , 2006, GD.

[4]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[5]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[6]  Klaus Mueller,et al.  Eurographics/ Ieee-vgtc Symposium on Visualization 2008 Illustrative Parallel Coordinates , 2022 .

[7]  Yifan Hu,et al.  Efficient Node Overlap Removal Using a Proximity Stress Model , 2009, GD.

[8]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[9]  Stéphane Lafon,et al.  Diffusion maps , 2006 .

[10]  Alexander M. Bronstein,et al.  Multigrid multidimensional scaling , 2006, Numer. Linear Algebra Appl..

[11]  Matthew O. Ward,et al.  Value and Relation Display: Interactive Visual Exploration of Large Data Sets with Hundreds of Dimensions , 2007, IEEE Trans. Vis. Comput. Graph..

[12]  Pedro Larrañaga,et al.  Genetic Algorithms for the Travelling Salesman Problem: A Review of Representations and Operators , 1999, Artificial Intelligence Review.

[13]  Klaus Mueller,et al.  TripAdvisor^{N-D}: A Tourism-Inspired High-Dimensional Space Exploration Framework with Overview and Detail , 2013, IEEE Transactions on Visualization and Computer Graphics.

[14]  Tamara Munzner,et al.  Steerable, Progressive Multidimensional Scaling , 2004, IEEE Symposium on Information Visualization.

[15]  Klaus Mueller,et al.  A network-based interface for the exploration of high-dimensional data spaces , 2012, 2012 IEEE Pacific Visualization Symposium.

[16]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[17]  Jacques Bertin,et al.  Semiology of Graphics - Diagrams, Networks, Maps , 2010 .

[18]  Marc Olano,et al.  Glimmer: Multilevel MDS on the GPU , 2009, IEEE Transactions on Visualization and Computer Graphics.

[19]  Joshua B. Tenenbaum,et al.  Global Versus Local Methods in Nonlinear Dimensionality Reduction , 2002, NIPS.

[20]  Alla Zelenyuk,et al.  Single Particle Laser Ablation Time-of-Flight Mass Spectrometer: An Introduction to SPLAT , 2005 .

[21]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[22]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[23]  Haim Levkowitz,et al.  Enhanced High Dimensional Data Visualization through Dimension Reduction and Attribute Arrangement , 2006, Tenth International Conference on Information Visualisation (IV'06).

[24]  Harri Siirtola,et al.  Visual Perception of Parallel Coordinate Visualizations , 2009, 2009 13th International Conference Information Visualisation.

[25]  J. Hartigan Printer graphics for clustering , 1975 .

[26]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[27]  Jaegul Choo,et al.  Two-stage framework for visualization of clustered high dimensional data , 2009, 2009 IEEE Symposium on Visual Analytics Science and Technology.

[28]  Thomas Ertl,et al.  Two-stage framework for a topology-based projection and visualization of classified document collections , 2010, 2010 IEEE Symposium on Visual Analytics Science and Technology.

[29]  LarrañagaP.,et al.  Genetic Algorithms for the Travelling Salesman Problem , 1999 .

[30]  Alfred Inselberg,et al.  Parallel coordinates: a tool for visualizing multi-dimensional geometry , 1990, Proceedings of the First IEEE Conference on Visualization: Visualization `90.

[31]  Kristin P. Bennett,et al.  Density-based indexing for approximate nearest-neighbor queries , 1999, KDD '99.