Self-supervised Dimensionality Reduction with Neural Networks and Pseudo-labeling

Dimensionality reduction (DR) is used to explore high-dimensional data in many applications. Deep learning techniques such as autoencoders have been used to provide fast, simple to use, and high-quality DR. However, such methods yield worse visual cluster separation than popular methods such as t-SNE and UMAP. We propose a deep learning DR method called Self-Supervised Network Projection (SSNP) which does DR based on pseudo-labels obtained from clustering. We show that SSNP produces better cluster separation than autoencoders, has out-of-sample, inverse mapping, and clustering capabilities, and is very fast and easy to use.

[1]  Raghu Machiraju,et al.  Visualizing Multidimensional Data with Glyph SPLOMs , 2014, Comput. Graph. Forum.

[2]  Mario Costa Sousa,et al.  iLAMP: Exploring high-dimensional spacing through backward multidimensional projection , 2012, 2012 IEEE Conference on Visual Analytics Science and Technology (VAST).

[3]  John F. Canny,et al.  T-SNE-CUDA: GPU-Accelerated T-SNE and its Applications to Modern Data , 2018, 2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).

[4]  Alexandru Telea,et al.  Deep learning multidimensional projections , 2019, Inf. Vis..

[5]  Rosane Minghim,et al.  Text Map Explorer: a Tool to Create and Explore Document Maps , 2006, Tenth International Conference on Information Visualisation (IV'06).

[6]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[7]  Andreas Kerren,et al.  Toward a Quantitative Survey of Dimension Reduction Techniques , 2019, IEEE Transactions on Visualization and Computer Graphics.

[8]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[9]  Alexandru Telea,et al.  Combining Extended Table Lens and Treemap Techniques for Visualizing Tabular Data , 2006, EuroVis.

[10]  Jarkko Venna,et al.  Visualizing gene interaction graphs with local multidimensional scaling , 2006, ESANN.

[11]  T. Lumley,et al.  PRINCIPAL COMPONENT ANALYSIS AND FACTOR ANALYSIS , 2004, Statistical Methods for Biomedical Research.

[12]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[13]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[14]  Valerio Pascucci,et al.  Visualizing High-Dimensional Data: Advances in the Past Decade , 2017, IEEE Transactions on Visualization and Computer Graphics.

[15]  Haim Levkowitz,et al.  Least Square Projection: A Fast High-Precision Multidimensional Projection Technique and Its Application to Document Mapping , 2008, IEEE Transactions on Visualization and Computer Graphics.

[16]  Laurens van der Maaten,et al.  Barnes-Hut-SNE , 2013, ICLR.

[17]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[18]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[19]  M. Espadoto,et al.  Deep Learning Inverse Multidimensional Projections , 2019, EuroVA@EuroVis.

[20]  Daniel Engel,et al.  A Survey of Dimension Reduction Methods for High-dimensional Data Analysis and Visualization , 2011, VLUDS.

[21]  Elmar Eisemann,et al.  Hierarchical Stochastic Neighbor Embedding , 2016, Comput. Graph. Forum.

[22]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[23]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[24]  Davide Anguita,et al.  Human Activity Recognition on Smartphones Using a Multiclass Hardware-Friendly Support Vector Machine , 2012, IWAAL.

[25]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[26]  Martin Becker,et al.  Robust dimensionality reduction for data visualization with deep neural networks , 2020, Graph. Model..

[27]  Georges G. Grinstein,et al.  A survey of visualizations for high-dimensional data mining , 2001 .

[28]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[29]  Luis Gustavo Nonato,et al.  Multidimensional Projection for Visual Analytics: Linking Techniques with Distortions, Tasks, and Layout Enrichment , 2019, IEEE Transactions on Visualization and Computer Graphics.

[30]  Richard A. Becker,et al.  The Visual Design and Control of Trellis Display , 1996 .

[31]  Luis Gustavo Nonato,et al.  Local Affine Multidimensional Projection , 2011, IEEE Transactions on Visualization and Computer Graphics.

[32]  Helwig Hauser,et al.  Visualization and Visual Analysis of Multifaceted Scientific Data: A Survey , 2013, IEEE Transactions on Visualization and Computer Graphics.

[33]  Elmar Eisemann,et al.  GPGPU Linear Complexity t-SNE Optimization , 2018, IEEE Transactions on Visualization and Computer Graphics.

[34]  Alfred Inselberg,et al.  Parallel coordinates: a tool for visualizing multi-dimensional geometry , 1990, Proceedings of the First IEEE Conference on Visualization: Visualization `90.

[35]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[36]  Alberto D. Pascual-Montano,et al.  A survey of dimensionality reduction techniques , 2014, ArXiv.

[37]  Elmar Eisemann,et al.  Approximated and User Steerable tSNE for Progressive Visual Analytics , 2015, IEEE Transactions on Visualization and Computer Graphics.

[38]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Jake K. Aggarwal,et al.  Human Activity Recognition , 2005, PReMI.

[40]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[41]  Jie Li,et al.  A survey of dimensionality reduction techniques based on random projection , 2017, ArXiv.

[42]  Ramana Rao,et al.  The table lens: merging graphical and symbolic representations in an interactive focus + context visualization for tabular information , 1994, CHI '94.

[43]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[44]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[45]  张振跃,et al.  Principal Manifolds and Nonlinear Dimensionality Reduction via Tangent Space Alignment , 2004 .

[46]  Martin Wattenberg,et al.  How to Use t-SNE Effectively , 2016 .

[47]  Joseph L. Zinnes,et al.  Theory and Methods of Scaling. , 1958 .

[48]  Rosane Minghim,et al.  Explaining Neighborhood Preservation for Multidimensional Projections , 2015, CGVC.

[49]  Zoubin Ghahramani,et al.  Unifying linear dimensionality reduction , 2014, 1406.0873.

[50]  Jing Wang,et al.  MLLE: Modified Locally Linear Embedding Using Multiple Weights , 2006, NIPS.

[51]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..