Evaluation of Dimensionality Reduction Techniques for Big data

In this digital era, big data has very high dimension and requires large amount of space for its data storage. Hence a lossless data interpretation will be difficult when big data contains large dimension. But, all these dimensions in big data may not be relevant or they may be interrelated and hence redundancy may exist in attribute set. Dimensionality reduction is a technique which focusses on downsizing the attributes and complication of a high dimensional data. In this paper, a detailed study of different dimensionality reduction techniques namely principal component analysis (PCA), linear discriminant analysis (LDA), kernel principal component analysis (KPCA), singular value decomposition (SVD), independent component analysis (ICA) has been proposed. Furthermore, it also provides comparative analysis based on various parameters.

[1]  Alberto D. Pascual-Montano,et al.  A survey of dimensionality reduction techniques , 2014, ArXiv.

[2]  M. R. Kaimal,et al.  Singular Value Decomposition- A Revisit on A CUDA Platform , 2014 .

[3]  Bogdan Raducanu,et al.  A supervised non-linear dimensionality reduction approach for manifold learning , 2012, Pattern Recognit..

[4]  Aamir Khan,et al.  Principal Component Analysis-Linear Discriminant Analysis Feature Extractor for Pattern Recognition , 2012, ArXiv.

[5]  Manoranjan Dash,et al.  Dimensionality reduction of unsupervised data , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[6]  K R Kavitha,et al.  Improved spectral clustering using PCA based similarity measure on different Laplacian graphs , 2016, 2016 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC).

[7]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[8]  Kuldip K. Paliwal,et al.  Linear discriminant analysis for the small sample size problem: an overview , 2014, International Journal of Machine Learning and Cybernetics.

[9]  Asha Ashok,et al.  Abnormality prediction in high dimensional dataset among semi supervised learning approaches , 2017, 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[10]  Sandhya Harikumar,et al.  Hybridized fragmentation of very large databases using clustering , 2015, 2015 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES).

[12]  Devi Prasanna Swain,et al.  Principal Component Analysis , 2017 .

[13]  Cherukuri Aswani Kumar,et al.  Analysis of unsupervised dimensionality reduction techniques , 2009, Comput. Sci. Inf. Syst..

[14]  Jorge Cadima,et al.  Principal component analysis: a review and recent developments , 2016, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[15]  Sumit Sharma,et al.  A Review on Dimension Reduction Techniques in Data Mining , 2018 .

[16]  Raji Ramachandran,et al.  A horizontal fragmentation method based on data semantics , 2016, 2016 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC).