A survey of dimensionality reduction techniques

Experimental life sciences like biology or chemistry have seen in the recent decades an explosion of the data available from experiments. Laboratory instruments become more and more complex and report hundreds or thousands measurements for a single experiment and therefore the statistical methods face challenging tasks when dealing with such high dimensional data. However, much of the data is highly redundant and can be efficiently brought down to a much smaller number of variables without a significant loss of information. The mathematical procedures making possible this reduction are called dimensionality reduction techniques; they have widely been developed by fields like Statistics or Machine Learning, and are currently a hot research topic. In this review we categorize the plethora of dimension reduction techniques available and give the mathematical insight behind them.

[1]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[2]  Hujun Yin,et al.  Learning Nonlinear Principal Manifolds by Self-Organising Maps , 2008 .

[3]  Jan de Leeuw,et al.  Nonlinear Principal Component Analysis , 1982 .

[4]  Mark A. Girolami,et al.  Mercer kernel-based clustering in feature space , 2002, IEEE Trans. Neural Networks.

[5]  Ben Taskar,et al.  Generative-Discriminative Basis Learning for Medical Imaging , 2012, IEEE Transactions on Medical Imaging.

[6]  Laura Schweitzer,et al.  Advances In Kernel Methods Support Vector Learning , 2016 .

[7]  M. Saunders,et al.  Towards a Generalized Singular Value Decomposition , 1981 .

[8]  张振跃,et al.  Principal Manifolds and Nonlinear Dimensionality Reduction via Tangent Space Alignment , 2004 .

[9]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[10]  Christopher K. I. Williams On a Connection between Kernel PCA and Metric Multidimensional Scaling , 2004, Machine Learning.

[11]  Wen Gao,et al.  Maximal Linear Embedding for Dimensionality Reduction , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  P. Sabatier A L 1 -norm Pca and a Heuristic Approach , 1996 .

[13]  LarrañagaPedro,et al.  A review of feature selection techniques in bioinformatics , 2007 .

[14]  Joaquim F. Pinto da Costa,et al.  A Weighted Principal Component Analysis and Its Application to Gene Expression Data , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[15]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[16]  Yousef Saad,et al.  Orthogonal Neighborhood Preserving Projections: A Projection-Based Dimensionality Reduction Technique , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  S. Shankar Sastry,et al.  Generalized principal component analysis (GPCA) , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  S. Mulaik Foundations of Factor Analysis , 1975 .

[19]  A. J. Bell,et al.  A Unifying Information-Theoretic Framework for Independent Component Analysis , 2000 .

[20]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[21]  Robert Jenssen,et al.  Kernel Entropy Component Analysis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Anil K. Jain,et al.  An Intrinsic Dimensionality Estimator from Near-Neighbor Information , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  A. E. Maxwell,et al.  Factor Analysis as a Statistical Method. , 1964 .

[24]  Hervé Abdi,et al.  Singular Value Decomposition ( SVD ) and Generalized Singular Value Decomposition ( GSVD ) , 2006 .

[25]  Hongyuan Zha,et al.  Adaptive Manifold Learning , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Alfred O. Hero,et al.  Geodesic entropic graphs for dimension and entropy estimation in manifold learning , 2004, IEEE Transactions on Signal Processing.

[27]  Mia Hubert,et al.  Robust PCA and classification in biosciences , 2004, Bioinform..

[28]  Sanjoy Dasgupta,et al.  Experiments with Random Projection , 2000, UAI.

[29]  Keinosuke Fukunaga,et al.  An Algorithm for Finding Intrinsic Dimensionality of Data , 1971, IEEE Transactions on Computers.

[30]  C. Spearman General intelligence Objectively Determined and Measured , 1904 .

[31]  Miguel Á. Carreira-Perpiñán,et al.  A Review of Dimension Reduction Techniques , 2009 .

[32]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[33]  Pierre Comon Independent component analysis - a new concept? signal processing , 1994 .

[34]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[35]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[36]  Werner Dubitzky,et al.  A Practical Approach to Microarray Data Analysis , 2003, Springer US.

[37]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[38]  John W. Tukey,et al.  A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[39]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[40]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[41]  José María Carazo,et al.  Smoothly distributed fuzzy c-means: a new self-organizing map , 2001, Pattern Recognit..

[42]  Amara Lynn Graps,et al.  An introduction to wavelets , 1995 .

[43]  R. Bro PARAFAC. Tutorial and applications , 1997 .

[44]  Colin Fyfe,et al.  Stochastic ICA Contrast Maximisation Using Oja's Nonlinear PCA Algorithm , 1997, Int. J. Neural Syst..

[45]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[46]  Heidelberg,et al.  Representing complex data using localized principal components with application to astronomical data , 2007, 0709.1538.

[47]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[48]  I. Jolliffe Principal Component Analysis , 2002 .

[49]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[50]  Dacheng Tao,et al.  Max-Min Distance Analysis by Using Sequential SDP Relaxation for Dimension Reduction , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  D. Donoho,et al.  Atomic Decomposition by Basis Pursuit , 2001 .

[52]  Victor Solo,et al.  Vector l0 Sparse Variable PCA , 2011, IEEE Trans. Signal Process..

[53]  I K Fodor,et al.  A Survey of Dimension Reduction Techniques , 2002 .

[54]  Pierre-Antoine Absil,et al.  Principal Manifolds for Data Visualization and Dimension Reduction , 2007 .

[55]  J. Friedman Exploratory Projection Pursuit , 1987 .

[56]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[57]  A. Laub,et al.  The singular value decomposition: Its computation and some applications , 1980 .

[58]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[59]  Terrence J. Sejnowski,et al.  Learning Overcomplete Representations , 2000, Neural Computation.

[60]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[61]  Nanda Kambhatla,et al.  Dimension Reduction by Local Principal Component Analysis , 1997, Neural Computation.

[62]  Ales Leonardis,et al.  Incremental PCA for on-line visual learning and recognition , 2002, Object recognition supported by user interaction for service robots.

[63]  H. Zha,et al.  Principal manifolds and nonlinear dimensionality reduction via tangent space alignment , 2004, SIAM J. Sci. Comput..

[64]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[65]  Nicolas Le Roux,et al.  Spectral Dimensionality Reduction , 2006, Feature Extraction.

[66]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[67]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[68]  Hongping Cai,et al.  Learning Linear Discriminant Projections for Dimensionality Reduction of Image Descriptors , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[69]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[70]  R. M. Johnson On a theorem stated by eckart and young , 1963 .

[71]  Gabriel Rilling,et al.  On empirical mode decomposition and its algorithms , 2003 .

[72]  H. Kaiser The Application of Electronic Computers to Factor Analysis , 1960 .

[73]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[74]  Peter Meer,et al.  Subspace Estimation Using Projection Based M-Estimators over Grassmann Manifolds , 2006, ECCV.

[75]  J. Kruskal Nonmetric multidimensional scaling: A numerical method , 1964 .

[76]  Lei Xu,et al.  Improved system for object detection and star/galaxy classification via local subspace analysis , 2003, Neural Networks.

[77]  Nojun Kwak,et al.  Principal Component Analysis Based on L1-Norm Maximization , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[78]  Satish Rao,et al.  Approximation schemes for Euclidean k-medians and related problems , 1998, STOC '98.

[79]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[80]  V. P. Pauca,et al.  Nonnegative matrix factorization for spectral data analysis , 2006 .

[81]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[82]  Matthias Scholz,et al.  Nonlinear Principal Component Analysis: Neural Network Models and Applications , 2008 .

[83]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[84]  Marleen de Bruijne,et al.  A Family of Principal Component Analyses for Dealing with Outliers , 2007, MICCAI.

[85]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[86]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[87]  Michael Elad,et al.  Dictionaries for Sparse Representation Modeling , 2010, Proceedings of the IEEE.

[88]  Joel A. Tropp,et al.  Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[89]  Thomas Martinetz,et al.  'Neural-gas' network for vector quantization and its application to time-series prediction , 1993, IEEE Trans. Neural Networks.

[90]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[91]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[92]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[93]  Samuel Kaski,et al.  Dimensionality reduction by random mapping: fast similarity computation for clustering , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[94]  Jun Zhang,et al.  Laplacian Eigenfunctions Learn Population Structure , 2009, PloS one.

[95]  Forrest W. Young,et al.  Introduction to Multidimensional Scaling: Theory, Methods, and Applications , 1981 .

[96]  Otto Opitz,et al.  Ordinal and Symbolic Data Analysis , 1996 .

[97]  Ran He,et al.  Robust Principal Component Analysis Based on Maximum Correntropy Criterion , 2011, IEEE Transactions on Image Processing.

[98]  N. Huang,et al.  The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis , 1998, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[99]  Motoaki Kawanabe,et al.  Uniqueness of Non-Gaussianity-Based Dimension Reduction , 2011, IEEE Transactions on Signal Processing.

[100]  Shang-Liang Chen,et al.  Orthogonal least squares learning algorithm for radial basis function networks , 1991, IEEE Trans. Neural Networks.

[101]  Thomas S. Huang,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation. , 2011, IEEE transactions on pattern analysis and machine intelligence.

[102]  O. Rioul,et al.  Wavelets and signal processing , 1991, IEEE Signal Processing Magazine.

[103]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[104]  Kjersti Engan,et al.  Multi-frame compression: theory and design , 2000, Signal Process..

[105]  Volkan Cevher,et al.  Low-Dimensional Models for Dimensionality Reduction and Signal Recovery: A Geometric Perspective , 2010, Proceedings of the IEEE.

[106]  Stan Z. Li,et al.  Local non-negative matrix factorization as a visual representation , 2002, Proceedings 2nd International Conference on Development and Learning. ICDL 2002.

[107]  P. Delicado Another Look at Principal Curves and Surfaces , 2001 .

[108]  Tülay Adali,et al.  Noncircular Principal Component Analysis and Its Application to Model Selection , 2011, IEEE Transactions on Signal Processing.

[109]  D. Donoho,et al.  Basis pursuit , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[110]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[111]  A. N. Gorban,et al.  Constructive methods of invariant manifolds for kinetic problems , 2003 .

[112]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[113]  Chris H. Q. Ding,et al.  R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization , 2006, ICML.

[114]  Shuicheng Yan,et al.  Neighborhood preserving embedding , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[115]  H. Kaiser The varimax criterion for analytic rotation in factor analysis , 1958 .

[116]  Takeo Kanade,et al.  Robust L/sub 1/ norm factorization in the presence of outliers and missing data by alternative convex programming , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[117]  Andrew B. Watson,et al.  DCT quantization matrices visually optimized for individual images , 1993, Electronic Imaging.

[118]  Michael Lindenbaum,et al.  Nonnegative Matrix Factorization with Earth Mover's Distance Metric for Image Analysis , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[120]  Bhaskar D. Rao,et al.  Sparse signal reconstruction from limited data using FOCUSS: a re-weighted minimum norm algorithm , 1997, IEEE Trans. Signal Process..

[121]  Michael J. Black,et al.  A Framework for Robust Subspace Learning , 2003, International Journal of Computer Vision.

[122]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[123]  Teuvo Kohonen,et al.  Things you haven't heard about the self-organizing map , 1993, IEEE International Conference on Neural Networks.

[124]  J M Carazo,et al.  A novel neural network technique for analysis and classification of EM single-particle images. , 2001, Journal of structural biology.

[125]  Nicolas Le Roux,et al.  Learning Eigenfunctions Links Spectral Embedding and Kernel PCA , 2004, Neural Computation.

[126]  Bernhard Schölkopf,et al.  Regularized Principal Manifolds , 1999, J. Mach. Learn. Res..

[127]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[128]  Dietrich Lehmann,et al.  Nonsmooth nonnegative matrix factorization (nsNMF) , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[129]  Bernd Fritzke,et al.  A Growing Neural Gas Network Learns Topologies , 1994, NIPS.

[130]  Deva Ramanan,et al.  Local Distance Functions: A Taxonomy, New Algorithms, and an Evaluation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[131]  Johan A. K. Suykens,et al.  Optimized Data Fusion for Kernel k-Means Clustering , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[132]  Michael Elad,et al.  From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images , 2009, SIAM Rev..

[133]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[134]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[135]  Fernando De la Torre,et al.  A Least-Squares Framework for Component Analysis , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[136]  Nanning Zheng,et al.  Non-negative matrix factorization for visual coding , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[137]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[138]  G. A. Ferguson,et al.  A general rotation criterion and its use in orthogonal rotation , 1970 .

[139]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[140]  Mark Girolami,et al.  Extraction of independent signal sources using a deflationary exploratory projection pursuit network , 1997 .

[141]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[142]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[143]  Balázs Kégl,et al.  Intrinsic Dimension Estimation Using Packing Numbers , 2002, NIPS.

[144]  Erkki Oja,et al.  Independent Component Analysis , 2001 .

[145]  Karthik S. Gurumoorthy,et al.  A Method for Compact Image Representation Using Sparse Matrix and Tensor Projections Onto Exemplar Orthonormal Bases , 2010, IEEE Transactions on Image Processing.

[146]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[147]  S. Mulaik,et al.  Foundations of Factor Analysis , 1975 .

[148]  Vladimir Pavlovic,et al.  Central Subspace Dimensionality Reduction Using Covariance Operators , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[149]  Chiou-Shann Fuh,et al.  Multiple Kernel Learning for Dimensionality Reduction , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[150]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[151]  Mike E. Davies,et al.  Gradient Pursuits , 2008, IEEE Transactions on Signal Processing.

[152]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[153]  Gene H. Golub,et al.  Singular value decomposition and least squares solutions , 1970, Milestones in Matrix Computation.