On the Selection of Dimension Reduction Techniques for Scientific Applications

Many dimension reduction methods have been proposed to discover the intrinsic, lower dimensional structure of a high-dimensional dataset. However, determining critical features in datasets that consist of a large number of features is still a challenge. In this article, through a series of carefully designed experiments on real-world datasets, we investigate the performance of different dimension reduction techniques, ranging from feature subset selection to methods that transform the features into a lower dimensional space. We also discuss methods that calculate the intrinsic dimensionality of a dataset in order to understand the reduced dimension. Using several evaluation strategies, we show how these different methods can provide useful insights into the data. These comparisons enable us to provide guidance to users on the selection of a technique for their dataset.

[1]  Stéphane Lafon,et al.  Diffusion maps , 2006 .

[2]  Richard L. White,et al.  The FIRST Survey: Faint Images of the Radio Sky at twenty centimeters , 1995 .

[3]  Chandrika Kamath,et al.  Searching for Bent-Double Galaxies in the First Survey , 2001 .

[4]  Gerard V. Trunk tatistical Estimation oftheIntrinsic Dimensionality ofaNoisy Signal Collection , 1976 .

[5]  F. S. Tsai Comparative Study of Dimensionality Reduction Techniques for Data Visualization , 2010 .

[6]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[7]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[8]  S. Qin,et al.  Selection of the Number of Principal Components: The Variance of the Reconstruction Error Criterion with a Comparison to Other Methods† , 1999 .

[9]  Keinosuke Fukunaga,et al.  An Algorithm for Finding Intrinsic Dimensionality of Data , 1971, IEEE Transactions on Computers.

[10]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[11]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[12]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[13]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[14]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[15]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[16]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[17]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[18]  H. Zha,et al.  Principal manifolds and nonlinear dimensionality reduction via tangent space alignment , 2004, SIAM J. Sci. Comput..

[19]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[20]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[21]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[22]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[23]  Shai Shalev-Shwartz,et al.  Ranking Categorical Features Using Generalization Properties , 2008, J. Mach. Learn. Res..

[24]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Chandrika Kamath,et al.  Associating weather conditions with ramp events in wind power generation , 2011, 2011 IEEE/PES Power Systems Conference and Exposition.

[26]  Hongyuan Zha,et al.  Principal Manifolds and Nonlinear Dimension Reduction via Local Tangent Space Alignment , 2002, ArXiv.

[27]  Shuicheng Yan,et al.  Neighborhood preserving embedding , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[28]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[29]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[30]  Balázs Kégl,et al.  Intrinsic Dimension Estimation Using Packing Numbers , 2002, NIPS.

[31]  K. Gabriel,et al.  The biplot graphic display of matrices with application to principal component analysis , 1971 .

[32]  Chandrika Kamath,et al.  Approximate Splitting for Ensembles of Trees using Histograms , 2001, SDM.

[33]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[34]  Leonard A. Smith Intrinsic limits on dimension calculations , 1988 .

[35]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[36]  Olli Silven,et al.  Comparison of dimensionality reduction methods for wood surface inspection , 2003, International Conference on Quality Control by Artificial Vision.

[37]  Kilian Q. Weinberger,et al.  An Introduction to Nonlinear Dimensionality Reduction by Maximum Variance Unfolding , 2006, AAAI.

[38]  Samuel H. Huang Dimensionality Reduction in Automatic Knowledge Acquisition: A Simple Greedy Search Approach , 2003, IEEE Trans. Knowl. Data Eng..

[39]  B. Nadler,et al.  Diffusion maps, spectral clustering and reaction coordinates of dynamical systems , 2005, math/0503445.

[40]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  Yousef Saad,et al.  Orthogonal Neighborhood Preserving Projections: A Projection-Based Dimensionality Reduction Technique , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Stephen J. Garland,et al.  Algorithm 97: Shortest path , 1962, Commun. ACM.

[43]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[44]  Chandrika Kamath,et al.  Retrieval using texture features in high-resolution multispectral satellite imagery , 2004, SPIE Defense + Commercial Sensing.

[45]  Tony Jebara,et al.  Structure preserving embedding , 2009, ICML '09.