Distance Preserving Dimension Reduction Using the QR Factorization or the Cholesky Factorization

Dimension reduction plays an important role in handling the massive quantity of high dimensional data such as biomedical text data, gene expression data, and mass spectrometry data, and so forth. In this paper, we introduce distance preserving dimension reduction (DPDR) based on the QR factorization (DPDR/QR) or the Cholesky factorization (DPDR/C). DPDR generates lower dimensional representations of the high-dimensional data, which can exactly preserve Euclidean distances and cosine similarities between any pair of data points in the original dimensional space. After projecting data points to the lower dimensional space obtained from DPDR, one can execute other data analysis algorithms. DPDR can substantially reduce the computing time and/or memory requirement of a given data analysis algorithm, especially when we need to run the data analysis algorithm many times for estimating parameters or searching for a better solution.

[1]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[2]  M. Naderi Think globally... , 2004, HIV prevention plus!.

[3]  Hyunsoo Kim,et al.  Distance Preserving Dimension Reduction for Manifold Learning , 2007, SDM.

[4]  J. B. Rosen,et al.  Lower Dimensional Representation of Text Data Based on Centroids and Least Squares , 2003 .

[5]  Haesun Park,et al.  Generalizing discriminant analysis using the generalized singular value decomposition , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  W. Massy Principal Components Regression in Exploratory Statistical Research , 1965 .

[7]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[8]  H. Zha,et al.  Principal manifolds and nonlinear dimensionality reduction via tangent space alignment , 2004, SIAM J. Sci. Comput..

[9]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[10]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[11]  Joseph L. Zinnes,et al.  Theory and Methods of Scaling. , 1958 .

[12]  Mukund Balasubramanian,et al.  The Isomap Algorithm and Topological Stability , 2002, Science.

[13]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[14]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[15]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[16]  Haesun Park,et al.  Equivalence of Several Two-Stage Methods for Linear Discriminant Analysis , 2004, SDM.