Various dimension reduction techniques for high dimensional data analysis: a review

In the era of healthcare, and its related research fields, the dimensionality problem of high dimensional data is a massive challenge as it contains a huge number of variables forming complex data matrices. The demand for dimension reduction of complex data is growing immensely to improvise data prediction, analysis and visualization. In general, dimension reduction techniques are defined as a compression of dataset from higher dimensional matrix to lower dimensional matrix. Several computational techniques have been implemented for data dimension reduction, which is further segregated into two categories such as feature extraction and feature selection. In this review, a detailed investigation of various feature extraction and feature selection methods has been carried out with a systematic comparison of several dimension reduction techniques for the analysis of high dimensional data and to overcome the problem of data loss. Then, some case studies are also cited to verify the better approach for data dimension reduction by considering few advances described in the technical literature. This review paper may guide researchers to choose the most effective method for satisfactory analysis of high dimensional data.

[1]  Kyriakos C. Giannakoglou,et al.  Evolutionary multi-objective optimization assisted by metamodels, kernel PCA and multi-criteria decision making techniques with applications in aerodynamics , 2018, Appl. Soft Comput..

[2]  Chunhui Zhao,et al.  A nested-loop Fisher discriminant analysis algorithm , 2015 .

[3]  Alioune Ngom,et al.  The non-negative matrix factorization toolbox for biological data mining , 2013, Source Code for Biology and Medicine.

[4]  Ahmad Taher Azar,et al.  Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis , 2014, Comput. Methods Programs Biomed..

[5]  Mohammed Al-Rawi,et al.  Genetic algorithm matched filter optimization for automated detection of blood vessels from digital retinal images , 2007, Comput. Methods Programs Biomed..

[6]  Xia Li,et al.  Comparative evaluation of support vector machines for computer aided diagnosis of lung cancer in CT based on a multi-dimensional data set , 2013, Comput. Methods Programs Biomed..

[7]  Korbinian Strimmer,et al.  A whitening approach to probabilistic canonical correlation analysis for omics data integration , 2018, BMC Bioinformatics.

[8]  Amit Acharyya,et al.  A Low-Complexity ECG Feature Extraction Algorithm for Mobile Healthcare Applications , 2013, IEEE Journal of Biomedical and Health Informatics.

[9]  Deniz Erdogmus,et al.  Information Theoretic Feature Transformation Learning for Brain Interfaces , 2020, IEEE Transactions on Biomedical Engineering.

[10]  Yao Zhang,et al.  Feature selection based on conditional mutual information: minimum conditional relevance and minimum conditional redundancy , 2018, Applied Intelligence.

[11]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[12]  Hongchuan Yu,et al.  Diverse Non-Negative Matrix Factorization for Multiview Data Representation , 2018, IEEE Transactions on Cybernetics.

[13]  Yi Hong,et al.  Genetic algorithms with applications in wireless communications , 2004, Int. J. Syst. Sci..

[14]  C. Croux,et al.  Sparse canonical correlation analysis from a predictive point of view , 2015, Biometrical journal. Biometrische Zeitschrift.

[15]  Duncan Fyfe Gillies,et al.  A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data , 2015, Adv. Bioinformatics.

[16]  Divya Jain,et al.  An Efficient Hybrid Feature Selection model for Dimensionality Reduction , 2018 .

[17]  Hamid A. Jalab,et al.  A survey on skin detection in colored images , 2018, Artificial Intelligence Review.

[18]  Yong Xia,et al.  GA-SVM based feature selection and parameter optimization in hospitalization expense modeling , 2019, Appl. Soft Comput..

[19]  E. A. Mary Anita,et al.  A Survey of Big Data Analytics in Healthcare and Government , 2015 .

[20]  J. Angel Arul Jothi,et al.  A survey on automated cancer diagnosis from histopathology images , 2017 .

[21]  Philippe Renevey,et al.  SVM-based recursive feature elimination to compare phase synchronization computed from broadband and narrowband EEG signals in Brain-Computer Interfaces , 2005, Signal Process..

[22]  I. Qaddoumi,et al.  Neonates with cancer and causes of death; lessons from 615 cases in the SEER databases , 2017, Cancer medicine.

[23]  Jon Atli Benediktsson,et al.  Feature Selection Based on Hybridization of Genetic Algorithm and Particle Swarm Optimization , 2015, IEEE Geoscience and Remote Sensing Letters.

[24]  Mohebbat Mohebbi,et al.  Principle component analysis (PCA) for investigation of relationship between population dynamics of microbial pathogenesis, chemical and sensory characteristics in beef slices containing Tarragon essential oil. , 2017, Microbial pathogenesis.

[25]  Luming Duan,et al.  Quantum discriminant analysis for dimensionality reduction and classification , 2015, 1510.00113.

[26]  Clifton van der Linden,et al.  The curse of dimensionality in Voting Advice Applications: reliability and validity in algorithm design , 2017 .

[27]  Yu-Jin Zhang,et al.  Nonnegative Matrix Factorization: A Comprehensive Review , 2013, IEEE Transactions on Knowledge and Data Engineering.

[28]  Charu C. Aggarwal,et al.  Mining Text Data , 2012, Springer US.

[29]  Habibollah Haron,et al.  Supervised, Unsupervised, and Semi-Supervised Feature Selection: A Review on Gene Selection , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[30]  M. Shamim Hossain,et al.  Healthcare Big Data Voice Pathology Assessment Framework , 2016, IEEE Access.

[31]  Kai Xu,et al.  Evaluating Interactive Visualization of Multidimensional Data Projection with Feature Transformation , 2017 .

[32]  Nebi Gedik,et al.  A new feature extraction method based on multi-resolution representations of mammograms , 2016, Appl. Soft Comput..

[33]  Hong Zhu,et al.  A survey on feature extraction for pattern recognition , 2011, Artificial Intelligence Review.

[34]  Yong Luo,et al.  Tensor Canonical Correlation Analysis for Multi-View Dimension Reduction , 2015, IEEE Trans. Knowl. Data Eng..

[35]  Liam A. McDonnell,et al.  Imaging mass spectrometry data reduction: Automated feature identification and extraction , 2010, Journal of the American Society for Mass Spectrometry.

[36]  Hui-Huang Hsu,et al.  Hybrid feature selection by combining filters and wrappers , 2011, Expert Syst. Appl..

[37]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[38]  Ashish Ghosh,et al.  Self-adaptive differential evolution for feature selection in hyperspectral image data , 2013, Appl. Soft Comput..

[39]  Yang Liu,et al.  Locally linear embedding: a survey , 2011, Artificial Intelligence Review.

[40]  Daniel A. Keim,et al.  Visual Interaction with Dimensionality Reduction: A Structured Literature Analysis , 2017, IEEE Transactions on Visualization and Computer Graphics.

[41]  Trey Ideker,et al.  Nonlinear dimension reduction and clustering by Minimum Curvilinearity unfold neuropathic pain and tissue embryological classes , 2010, Bioinform..

[42]  Amir Hussain,et al.  An online generalized eigenvalue version of Laplacian Eigenmaps for visual big data , 2016, Neurocomputing.