Supporting Analysis of Dimensionality Reduction Results with Contrastive Learning

Dimensionality reduction (DR) is frequently used for analyzing and visualizing high-dimensional data as it provides a good first glance of the data. However, to interpret the DR result for gaining useful insights from the data, it would take additional analysis effort such as identifying clusters and understanding their characteristics. While there are many automatic methods (e.g., density-based clustering methods) to identify clusters, effective methods for understanding a cluster's characteristics are still lacking. A cluster can be mostly characterized by its distribution of feature values. Reviewing the original feature values is not a straightforward task when the number of features is large. To address this challenge, we present a visual analytics method that effectively highlights the essential features of a cluster in a DR result. To extract the essential features, we introduce an enhanced usage of contrastive principal component analysis (cPCA). Our method, called ccPCA (contrasting clusters in PCA), can calculate each feature's relative contribution to the contrast between one cluster and other clusters. With ccPCA, we have created an interactive system including a scalable visualization of clusters' feature contributions. We demonstrate the effectiveness of our method and system with case studies using several publicly available datasets.

[1]  Valerio Pascucci,et al.  Visualizing High-Dimensional Data: Advances in the Past Decade , 2017, IEEE Transactions on Visualization and Computer Graphics.

[2]  H. Abdi,et al.  Multiple Correspondence Analysis , 2006 .

[3]  Jeffrey Heer,et al.  SpanningAspectRatioBank Easing FunctionS ArrayIn ColorIn Date Interpolator MatrixInterpola NumObjecPointI Rectang ISchedu Parallel Pause Scheduler Sequen Transition Transitioner Transiti Tween Co DelimGraphMLCon IData JSONCon DataField DataSc Dat DataSource Data DataUtil DirtySprite LineS RectSprite , 2011 .

[4]  James Zou,et al.  Exploring patterns enriched in a dataset with contrastive principal component analysis , 2018, Nature Communications.

[5]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[6]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[7]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[8]  James Zou,et al.  Rich Component Analysis , 2015, ICML.

[9]  Boudewijn P. F. Lelieveldt,et al.  CyteGuide: Visual Guidance for Hierarchical Single-Cell Analysis , 2018, IEEE Transactions on Visualization and Computer Graphics.

[10]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[11]  B. L. Roux,et al.  Multiple Correspondence Analysis , 2009 .

[12]  HeerJeffrey,et al.  D3 Data-Driven Documents , 2011 .

[13]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[14]  Robert H. Riffenburgh,et al.  Linear Discriminant Analysis , 1960 .

[15]  Daniel A. Keim,et al.  Visual Interaction with Dimensionality Reduction: A Structured Literature Analysis , 2017, IEEE Transactions on Visualization and Computer Graphics.

[16]  Xiaoru Yuan,et al.  Exploring high-dimensional data through locally enhanced projections , 2018, J. Vis. Lang. Comput..

[17]  R. Tibshirani,et al.  Discriminant Analysis by Gaussian Mixtures , 1996 .

[18]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[19]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[20]  Alok Baveja,et al.  Computing , Artificial Intelligence and Information Technology A data-driven software tool for enabling cooperative information sharing among police departments , 2002 .

[21]  Ming-Hsuan Yang,et al.  Incremental Learning for Robust Visual Tracking , 2008, International Journal of Computer Vision.

[22]  Çagatay Demiralp,et al.  Clustrophile: A Tool for Visual Clustering Analysis , 2017, ArXiv.

[23]  Luis Gustavo Nonato,et al.  Multidimensional Projection for Visual Analytics: Linking Techniques with Distortions, Tasks, and Layout Enrichment , 2019, IEEE Transactions on Visualization and Computer Graphics.

[24]  K. Cios,et al.  Self-Organizing Feature Maps Identify Proteins Critical to Learning in a Mouse Model of Down Syndrome , 2015, PloS one.

[25]  Edgar Acuña,et al.  The Treatment of Missing Values and its Effect on Classifier Accuracy , 2004 .

[26]  Ryan P. Adams,et al.  Contrastive Learning Using Spectral Methods , 2013, NIPS.

[27]  Boris Müller,et al.  Probing Projections: Interaction Techniques for Interpreting Arrangements and Errors of Dimensionality Reductions , 2016, IEEE Transactions on Visualization and Computer Graphics.

[28]  M. Parimala,et al.  A Survey on Density Based Clustering Algorithms for Mining Large Spatial Databases , 2011 .

[29]  Alfred Inselberg,et al.  Parallel coordinates: a tool for visualizing multi-dimensional geometry , 1990, Proceedings of the First IEEE Conference on Visualization: Visualization `90.

[30]  J. Hartigan Printer graphics for clustering , 1975 .

[31]  Tamara Munzner,et al.  Visualizing dimensionally-reduced data: interviews with analysts and a characterization of task sequences , 2014, BELIV.

[32]  Daniel Müllner,et al.  Modern hierarchical, agglomerative clustering algorithms , 2011, ArXiv.

[33]  Abubakar Abid,et al.  Contrastive Variational Autoencoder Enhances Salient Features , 2019, ArXiv.

[34]  Kwan-Liu Ma,et al.  An Incremental Dimensionality Reduction Method for Visualizing Streaming Multidimensional Data , 2019, IEEE Transactions on Visualization and Computer Graphics.

[35]  S. Garte The role of ethnicity in cancer susceptibility gene polymorphisms: the example of CYP1A1. , 1998, Carcinogenesis.

[36]  Tommi S. Jaakkola,et al.  Fast optimal leaf ordering for hierarchical clustering , 2001, ISMB.

[37]  Martin Ester,et al.  Density‐based clustering , 2019, WIREs Data Mining Knowl. Discov..

[38]  Abubakar Abid,et al.  Contrastive Multivariate Singular Spectrum Analysis , 2019, 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[39]  Paulo E. Rauber,et al.  Visualizing the Hidden Activity of Artificial Neural Networks , 2017, IEEE Transactions on Visualization and Computer Graphics.

[40]  William Ribarsky,et al.  Understanding Principal Component Analysis Using a Visual Analytics Tool , 2009 .

[41]  Vivek Kumar Bagaria,et al.  Contrastive Principal Component Analysis , 2017, ArXiv.

[42]  Alex Endert,et al.  AxiSketcher: Interactive Nonlinear Axis Mapping of Visualizations through User Drawings , 2017, IEEE Transactions on Visualization and Computer Graphics.

[43]  Jean-Daniel Fekete,et al.  Matrix Reordering Methods for Table and Network Visualization , 2016, Comput. Graph. Forum.

[44]  Eser Kandogan Star Coordinates: A Multi-dimensional Visualization Technique with Uniform Treatment of Dimensions , 2000 .

[45]  Kenney Ng,et al.  Clustervision: Visual Supervision of Unsupervised Clustering , 2018, IEEE Transactions on Visualization and Computer Graphics.

[46]  Jaegul Choo,et al.  PIVE: Per-Iteration Visualization Environment for Real-Time Interactions with Dimension Reduction and Clustering , 2017, AAAI.

[47]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[48]  Y. Takane,et al.  Multidimensional Scaling I , 2015 .

[49]  Alexandru Telea,et al.  Visual Analysis of Multi‐Dimensional Categorical Data Sets , 2013, Comput. Graph. Forum.

[50]  E. Oja,et al.  On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix , 1985 .

[51]  T. Lumley,et al.  PRINCIPAL COMPONENT ANALYSIS AND FACTOR ANALYSIS , 2004, Statistical Methods for Biomedical Research.

[52]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[53]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[54]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[55]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[56]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[57]  Chris North,et al.  SIRIUS: Dual, Symmetric, Interactive Dimension Reductions , 2019, IEEE Transactions on Visualization and Computer Graphics.

[58]  Luis Gustavo Nonato,et al.  Uncovering Representative Groups in Multidimensional Projections , 2015, Comput. Graph. Forum.

[59]  D. W. Scott On optimal and data based histograms , 1979 .

[60]  Ricardo J. G. B. Campello,et al.  Density-Based Clustering Based on Hierarchical Density Estimates , 2013, PAKDD.

[61]  Jingzhou Liu,et al.  Visualizing Large-scale and High-dimensional Data , 2016, WWW.

[62]  Danilo Medeiros Eler,et al.  An Approach to Perform Local Analysis on Multidimensional Projection , 2017, 2017 30th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI).

[63]  Feiping Nie,et al.  Linear Discriminative Star Coordinates for Exploring Class and Cluster Separation of High Dimensional Data , 2017, Comput. Graph. Forum.

[64]  Arvid Lundervold,et al.  Representative Factor Generation for the Interactive Visual Analysis of High-Dimensional Data , 2012, IEEE Transactions on Visualization and Computer Graphics.

[65]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[66]  Chris North,et al.  Towards a Systematic Combination of Dimension Reduction and Clustering in Visual Analytics , 2018, IEEE Transactions on Visualization and Computer Graphics.

[67]  Marius Marusteri,et al.  Comparing groups for statistical differences: how to choose the right statistical test? , 2010 .

[68]  Juyang Weng,et al.  Candid Covariance-Free Incremental Principal Component Analysis , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[69]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[70]  Flora S. Tsai Dimensionality reduction techniques for blog visualization , 2011, Expert Syst. Appl..

[71]  Luis Gustavo Nonato,et al.  Understanding Attribute Variability in Multidimensional Projections , 2016, 2016 29th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI).

[72]  R. Bro,et al.  Resolving the sign ambiguity in the singular value decomposition , 2008 .

[73]  Mengchen Liu,et al.  A survey on information visualization: recent advances and challenges , 2014, The Visual Computer.