Parametric nonlinear dimensionality reduction using kernel t-SNE

Abstract Novel non-parametric dimensionality reduction techniques such as t-distributed stochastic neighbor embedding (t-SNE) lead to a powerful and flexible visualization of high-dimensional data. One drawback of non-parametric techniques is their lack of an explicit out-of-sample extension. In this contribution, we propose an efficient extension of t-SNE to a parametric framework, kernel t-SNE, which preserves the flexibility of basic t-SNE, but enables explicit out-of-sample extensions. We test the ability of kernel t-SNE in comparison to standard t-SNE for benchmark data sets, in particular addressing the generalization ability of the mapping for novel data. In the context of large data sets, this procedure enables us to train a mapping for a fixed size subset only, mapping all data afterwards in linear time. We demonstrate that this technique yields satisfactory results also for large data sets provided missing information due to the small size of the subset is accounted for by auxiliary information such as class labels, which can be integrated into kernel t-SNE based on the Fisher information.

[1]  Kilian Q. Weinberger,et al.  An Introduction to Nonlinear Dimensionality Reduction by Maximum Variance Unfolding , 2006, AAAI.

[2]  Barbara Hammer,et al.  Topographic Mapping of Large Dissimilarity Data Sets , 2010, Neural Computation.

[3]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[4]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[5]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[6]  Barbara Hammer,et al.  Visualizing the quality of dimensionality reduction , 2013, ESANN.

[7]  Barbara Hammer,et al.  Discriminative Dimensionality Reduction Mappings , 2012, IDA.

[8]  Matthew Brand,et al.  Charting a Manifold , 2002, NIPS.

[9]  Geoffrey E. Hinton,et al.  Global Coordination of Local Linear Models , 2001, NIPS.

[10]  Joseph L. Zinnes,et al.  Theory and Methods of Scaling. , 1958 .

[11]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[12]  Samuel Kaski,et al.  Improved learning of Riemannian metrics for exploratory analysis [Neural Networks 17 (8–9) 1087–1100] , 2005 .

[13]  Hau-San Wong,et al.  Kernel clustering-based discriminant analysis , 2007, Pattern Recognit..

[14]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[15]  Michel Verleysen,et al.  Scale-independent quality criteria for dimensionality reduction , 2010, Pattern Recognit. Lett..

[16]  Matthew O. Ward,et al.  Interactive Data Visualization: Foundations, Techniques, and Applications, Second Edition - 360 Degree Business , 2015 .

[17]  VerleysenMichel,et al.  Quality assessment of dimensionality reduction , 2009 .

[18]  Sven Behnke,et al.  Layer-wise Learning of Feature Hierarchies , 2012 .

[19]  David Cohn,et al.  Informed Projections , 2002, NIPS.

[20]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[21]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[22]  Barbara Hammer,et al.  Out-of-sample kernel extensions for nonparametric dimensionality reduction , 2012, ESANN.

[23]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[24]  Michel Verleysen,et al.  Quality assessment of dimensionality reduction: Rank-based criteria , 2009, Neurocomputing.

[25]  Jarkko Venna,et al.  Information Retrieval Perspective to Nonlinear Dimensionality Reduction for Data Visualization , 2010, J. Mach. Learn. Res..

[26]  Frank-Michael Schleif,et al.  Topographic Mapping of Dissimilarity Data , 2011, WSOM.

[27]  Samuel Kaski,et al.  Scalable Optimization of Neighbor Embedding for Visualization , 2013, ICML.

[28]  Barbara Hammer,et al.  Local matrix learning in clustering and applications for manifold visualization , 2010, Neural Networks.

[29]  Hujun Yin,et al.  On the equivalence between kernel self-organising maps and self-organising mixture density networks , 2006, Neural Networks.

[30]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[31]  Shiliang Sun,et al.  Tangent space intrinsic manifold regularization for data representation , 2013, 2013 IEEE China Summit and International Conference on Signal and Information Processing.

[32]  Michael Biehl,et al.  A General Framework for Dimensionality-Reducing Data Visualization Mapping , 2012, Neural Computation.

[33]  Athanasios V. Vasilakos,et al.  Big data: From beginning to future , 2016, Int. J. Inf. Manag..

[34]  Laurens van der Maaten,et al.  Learning a Parametric Embedding by Preserving Local Structure , 2009, AISTATS.

[35]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[36]  Matthew O. Ward,et al.  Interactive Data Visualization - Foundations, Techniques, and Applications , 2010 .

[37]  Samuel Kaski,et al.  Bankruptcy analysis with self-organizing maps in learning metrics , 2001, IEEE Trans. Neural Networks.