Performance Evaluation of t-SNE and MDS Dimensionality Reduction Techniques with KNN, ENN and SVM Classifiers

The central goal of this paper is to establish two commonly available dimensionality reduction (DR) methods i.e. t-distributed Stochastic Neighbor Embedding (t-SNE) and Multidimensional Scaling (MDS) in Matlab and to observe their application in several datasets. These DR techniques are applied to nine different datasets namely CNAE9, Segmentation, Seeds, Pima Indians diabetes, Parkinsons, Movement Libras, Mammographic Masses, Knowledge, and Ionosphere acquired from UCI machine learning repository. By applying t-SNE and MDS algorithms, each dataset is transformed to the half of its original dimension by eliminating unnecessary features from the datasets. Subsequently, these datasets with reduced dimensions are fed into three supervised classification algorithms for classification. These classification algorithms are K Nearest Neighbors (KNN), Extended Nearest Neighbors (ENN), and Support Vector Machine (SVM). Again, all these algorithms are implemented in Matlab. The training and test data ratios are maintained as ninety percent: ten percent for each dataset. Upon accuracy observation, the efficiency for every dimensionality technique with availed classification algorithms is analyzed and the performance of each classifier is evaluated.

[1]  Wentian Li,et al.  Application of t-SNE to Human Genetic Data , 2017, bioRxiv.

[2]  Md. Abu Bakr Siddique,et al.  Study and Observation of the Variation of Accuracies of KNN, SVM, LMNN, ENN Algorithms on Eleven Different Datasets from UCI Machine Learning Repository , 2018, 2018 4th International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT).

[3]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[4]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[5]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[6]  Andrew W. Mead Review of the Development of Multidimensional Scaling Methods , 1992 .

[7]  Redha Touati,et al.  MDS-based Multi-axial Dimensionality Reduction Model for Human Action Recognition , 2014, 2014 Canadian Conference on Computer and Robot Vision.

[8]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[9]  Mark M. Davis,et al.  Automatic Classification of Cellular Expression by Nonlinear Stochastic Embedding (ACCENSE) , 2013, Proceedings of the National Academy of Sciences.

[10]  R. Tibshirani,et al.  An introduction to the bootstrap , 1993 .

[11]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[12]  Uri Shaham,et al.  Stochastic Neighbor Embedding separates well-separated clusters , 2017, 1702.02670.

[13]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[14]  Mykola Pechenizkiy,et al.  A comparative study of dimensionality reduction techniques to enhance trace clustering performances , 2013, Expert Syst. Appl..

[15]  Sahista Machchhar,et al.  An evolution and evaluation of dimensionality reduction techniques — A comparative study , 2014 .

[16]  Shadman Sakib,et al.  Performance Analysis of Deep Autoencoder and NCA Dimensionality Reduction Techniques with KNN, ENN and SVM Classifiers , 2019, 2019 2nd International Conference on Innovation in Engineering and Technology (ICIET).

[17]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[18]  Alireza Akhbardeh,et al.  Comparative analysis of nonlinear dimensionality reduction techniques for breast MRI segmentation. , 2012, Medical physics.

[19]  I. Jolliffe Principal Component Analysis , 2002 .