A Gaussian Process Decoder with Spectral Mixtures and a Locally Estimated Manifold for Data Visualization

Dimensionality reduction plays an important role in interpreting and visualizing high-dimensional data. Previous methods for data visualization overestimate the local structure and lack the consideration of global preservation. In this study, we develop a Gaussian process latent variable model (GP-LVM) for data visualization. GP-LVMs are one of the frameworks of principal component analysis and preserve the global structure effectively. The drawbacks of GP-LVMs are the absence of local structure preservation and the use of low-expressive kernel functions. Therefore, we introduce regularization for local preservation and an expressive kernel function into GP-LVMs to overcome these limitations. As a result, we reflect the global and local structures in low-dimensional representations, improving the reliability and visibility of embeddings. We conduct qualitative and quantitative experiments comparing baselines and state-of-the-art methods on image and text datasets.

[1]  C. Rudin,et al.  Understanding How Dimension Reduction Tools Work: An Empirical Approach to Deciphering t-SNE, UMAP, TriMAP, and PaCMAP for Data Visualization , 2020, J. Mach. Learn. Res..

[2]  Zhenqiu Liu,et al.  Visualizing Single-Cell RNA-seq Data with Semisupervised Principal Component Analysis , 2020, International journal of molecular sciences.

[3]  Philipp Berens,et al.  Attraction-Repulsion Spectrum in Neighbor Embeddings , 2020, J. Mach. Learn. Res..

[4]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[5]  Guillaume Hennequin,et al.  Manifold GPLVMs for discovering non-Euclidean latent structure in neural data , 2020, NeurIPS.

[6]  Ronald R. Coifman,et al.  Visualizing structure and transitions in high-dimensional biological data , 2019, Nature Biotechnology.

[7]  Andreas Kerren,et al.  Toward a Quantitative Survey of Dimension Reduction Techniques , 2019, IEEE Transactions on Visualization and Computer Graphics.

[8]  Deng Cai,et al.  AtSNE: Efficient and Robust Visualization on GPU through Hierarchical Optimization , 2019, KDD.

[9]  Johannes L. Schönberger,et al.  SciPy 1.0: fundamental algorithms for scientific computing in Python , 2019, Nature Methods.

[10]  Haibing Chen,et al.  Wireless Indoor Localization Using Convolutional Neural Network and Gaussian Process Regression , 2019, Sensors.

[11]  Lai Guan Ng,et al.  Dimensionality reduction for visualizing single-cell data using UMAP , 2018, Nature Biotechnology.

[12]  Philipp Berens,et al.  The art of using t-SNE for single-cell transcriptomics , 2018, Nature Communications.

[13]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[14]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[15]  Xiaoou Tang,et al.  Surpassing Human-Level Face Verification Performance on LFW with GaussianFace , 2014, AAAI.

[16]  Joshua B. Tenenbaum,et al.  Automatic Construction and Natural-Language Description of Nonparametric Regression Models , 2014, AAAI.

[17]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Luis Gustavo Nonato,et al.  Local Affine Multidimensional Projection , 2011, IEEE Transactions on Visualization and Computer Graphics.

[19]  Wu-Jun Li,et al.  Gaussian Process Latent Random Field , 2010, AAAI.

[20]  David J. Fleet,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gaussian Process Dynamical Model , 2007 .

[21]  Neil D. Lawrence,et al.  Hierarchical Gaussian process latent variable models , 2007, ICML '07.

[22]  Trevor Darrell,et al.  Discriminative Gaussian process latent variable model for classification , 2007, ICML '07.

[23]  B. Schölkopf,et al.  Kernel methods in machine learning , 2007, math/0701907.

[24]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[25]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[26]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[27]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[28]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[29]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[30]  Neil D. Lawrence,et al.  Variational Inference for Latent Variables and Uncertain Inputs in Gaussian Processes , 2016, J. Mach. Learn. Res..

[31]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[32]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[33]  Stéphane Lafon,et al.  Diffusion maps , 2006 .

[34]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[35]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .