Unsupervised deep kernel for high dimensional data

In this paper we propose a method for visualizing unlabeled high dimensional data in a 3-dimensional space using Kernel Principal Component Analysis (KPCA) with the proposed unsupervised deep kernel. First, an optimal cluster structure of the data is determined using an unsupervised procedure. Second, an unsupervised deep kernel is learned via the clustered data. Then, deep kernel based PCA is applied to map the data into a 3-dimensional space for visualization. To ensure the visualization on a 3-dimensional space is reliable, we proposed the V3D (Visualizability in 3 Dimension) measurement to evaluate the amount of structural information is maintained by the dimension reduction process. V3D is computed based on the comparison of the clustering structures of the data before and after dimension reduction. The reduction and visualization results using the deep kernel based PCA are compared with several other methods include Principal Component Analysis, PCA based on other kernel functions, Entropy Component Analysis, and deep learning approaches. The experimental results show that the deep kernel outperforms all other methods in dimension reductions with respect to the V3D measure.

[1]  Flora S. Tsai Dimensionality reduction techniques for blog visualization , 2011, Expert Syst. Appl..

[2]  W. Krzanowski,et al.  A Criterion for Determining the Number of Groups in a Data Set Using Sum-of-Squares Clustering , 1988 .

[3]  Rolf Herken,et al.  The Universal Turing Machine: A Half-Century Survey , 1992 .

[4]  Kevin Baker,et al.  Classification of radar returns from the ionosphere using neural networks , 1989 .

[5]  Mohammad Nayeem Teli Dimensionality Reduction Using Neural Networks , 2007 .

[6]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[7]  Hongxun Yao,et al.  Dimensionality reduction strategy based on auto-encoder , 2015, ICIMCS '15.

[8]  Ying Xie,et al.  Visualization of Big High Dimensional Data in a Three Dimensional Space , 2016, 2016 IEEE/ACM 3rd International Conference on Big Data Computing Applications and Technologies (BDCAT).

[9]  Ik Soo Lim,et al.  Trustworthy dimension reduction for visualization different data sets , 2014, Inf. Sci..

[10]  Donald Michie,et al.  The Fifth Generation's unbridged gap , 1988 .

[11]  Ying Xie,et al.  Deep Kernel: Learning Kernel Function from Data Using Deep Neural Network , 2016, 2016 IEEE/ACM 3rd International Conference on Big Data Computing Applications and Technologies (BDCAT).

[12]  Danny Coomans,et al.  Improvements to the classification performance of RDA , 1993 .

[13]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[14]  Robert Jenssen,et al.  Kernel Entropy Component Analysis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Ben Kröse,et al.  Deep Belief Networks for dimensionality reduction , 2008 .

[16]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[17]  O. Mangasarian,et al.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Jarkko Venna,et al.  Information Retrieval Perspective to Nonlinear Dimensionality Reduction for Data Visualization , 2010, J. Mach. Learn. Res..

[19]  Jianping Zhang,et al.  Selecting Typical Instances in Instance-Based Learning , 1992, ML.

[20]  Paul Horton,et al.  A Probabilistic Classification System for Predicting the Cellular Localization Sites of Proteins , 1996, ISMB.

[21]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .