Visualizing High-Dimensional Data Using t-Distributed Stochastic Neighbor Embedding Algorithm

Data visualization is a powerful tool and widely adopted by organizations for its effectiveness to abstract the right information, understand, and interpret results clearly and easily. The real challenge in any data science exploration is to visualize it. Visualizing a discrete, categorical data attribute using bar plots, pie charts are a few of the effective ways for data exploration. Most of the datasets have a large number of features. In other words, data is distributed across a high number of dimensions. Visually exploring such high-dimensional data can then become challenging and even practically impossible to do manually. Hence it is essential to understand how to visualize high-dimensional datasets. t-Distributed stochastic neighbor embedding (t-SNE) is a technique for dimensionality reduction and explicitly applicable to the visualization of high-dimensional datasets.

[1]  O. Rioul,et al.  Wavelets and signal processing , 1991, IEEE Signal Processing Magazine.

[2]  Himanshu Upadhyay,et al.  Deep learning approach to detect malicious attacks at system level: poster , 2019, WiSec.

[3]  Marc Olano,et al.  Glimmer: Multilevel MDS on the GPU , 2009, IEEE Transactions on Visualization and Computer Graphics.

[4]  David Picard,et al.  Dimensionality reduction of visual features using sparse projectors for content-based image retrieval , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[5]  Boudewijn P F Lelieveldt,et al.  Data-driven identification of prognostic tumor subpopulations using spatially mapped t-SNE of mass spectrometry imaging data , 2016, Proceedings of the National Academy of Sciences.

[6]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[7]  Jean Fan,et al.  Comparison of Principal Component Analysis and t-Stochastic Neighbor Embedding with Distance Metric Modifications for Single-cell RNA-sequencing Data Analysis , 2017 .

[8]  Pavel Pudil,et al.  Novel Methods for Feature Subset Selection with Respect to Problem Knowledge , 1998 .

[9]  Hendrik Heuer,et al.  Text comparison using word vector representations and dimensionality reduction , 2016, ArXiv.

[10]  Himanshu Upadhyay,et al.  Behavioral Analysis of System Call Sequences Using LSTM Seq-Seq, Cosine Similarity and Jaccard Similarity for Real-Time Anomaly Detection , 2019, 2019 International Conference on Computational Science and Computational Intelligence (CSCI).

[11]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[12]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[13]  I. Jolliffe Principal Component Analysis , 2005 .

[14]  Xia Mao,et al.  Facial Expression Recognition Based on t-SNE and AdaboostM2 , 2013, 2013 IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing.

[15]  Hongxun Yao,et al.  Auto-encoder based dimensionality reduction , 2016, Neurocomputing.

[16]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[17]  Wei Wang,et al.  Generalized Autoencoder: A Neural Network Framework for Dimensionality Reduction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.