An Overview of Numerical Acceleration Techniques for Nonlinear Dimension Reduction

We are living in an increasingly data-dependent world - making sense of large, high-dimensional data sets is an important task for researchers in academia, industry, and government. Techniques from machine learning, namely nonlinear dimension reduction, seek to organize this wealth of data by extracting descriptive features. These techniques, though powerful in their ability to find compact representational forms, are hampered by their high computational costs. In their naive implementation, this prevents them from processing large modern data collections in a reasonable time or with modest computational means. In this summary article we shall discuss some of the important numerical techniques which drastically increase the computational efficiency of these methods while preserving much of their representational power. Specifically, we address random projections, approximate k-nearest neighborhoods, approximate kernel methods, and approximate matrix decomposition methods.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[3]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[4]  J. Benedetto,et al.  Nonlinear Dimensionality Reduction via the ENH-LTSA Method for Hyperspectral Image Classification , 2014, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[5]  D. W. Scott,et al.  PROBABILITY DENSITY ESTIMATION IN HIGHER DIMENSIONS , 2014 .

[6]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[7]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[8]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[9]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[10]  G. Nolan,et al.  Computational solutions to large-scale data management and analysis , 2010, Nature Reviews Genetics.

[11]  Matthias Rupp,et al.  Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach. , 2015, Journal of chemical theory and computation.

[12]  Geof H. Givens,et al.  Computational Statistics: Givens/Computational Statistics , 2012 .

[13]  G. Jogesh Babu,et al.  Big data in astronomy , 2012 .

[14]  Weiwei Sun,et al.  UL-Isomap based nonlinear dimensionality reduction for hyperspectral imagery classification , 2014 .

[15]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[16]  W. Czaja,et al.  Randomized approximations of operators and their spectral decomposition for diffusion based embeddings of heterogeneous data , 2015, 2015 3rd International Workshop on Compressed Sensing Theory and its Applications to Radar, Sonar and Remote Sensing (CoSeRa).

[17]  James T. Kwok,et al.  Density-Weighted Nystrm Method for Computing Large Kernel Eigensystems , 2009, Neural Computation.

[18]  Wojciech Czaja,et al.  Schroedinger Eigenmaps with nondiagonal potentials for spatial-spectral clustering of hyperspectral imagery , 2014, Defense + Security Symposium.

[19]  Diwakar Shukla,et al.  To milliseconds and beyond: challenges in the simulation of protein folding. , 2013, Current opinion in structural biology.

[20]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[21]  Patrick J. Wolfe,et al.  On landmark selection and sampling in high-dimensional data analysis , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[22]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[23]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Wojciech Czaja,et al.  Schroedinger Eigenmaps for the Analysis of Biomedical Data , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  V. Marx Biology: The big challenges of big data , 2013, Nature.

[26]  D. Donoho,et al.  Sparse MRI: The application of compressed sensing for rapid MR imaging , 2007, Magnetic resonance in medicine.

[27]  David A. Landgrebe,et al.  Analyzing high-dimensional multispectral data , 1993, IEEE Trans. Geosci. Remote. Sens..

[28]  Mohamed-Ali Belabbas,et al.  Spectral methods in machine learning and new strategies for very large datasets , 2009, Proceedings of the National Academy of Sciences.

[29]  Ting Sun,et al.  Single-pixel imaging via compressive sampling , 2008, IEEE Signal Process. Mag..

[30]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[31]  Richard G. Baraniuk,et al.  Random Projections of Smooth Manifolds , 2009, Found. Comput. Math..

[32]  Avner Halevy Extensions of Laplacian Eigenmaps for Manifold Learning , 2011 .

[33]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[34]  Adam Jacobs,et al.  The pathologies of big data , 2009, Commun. ACM.

[35]  Alon Zakai,et al.  Manifold Learning: The Price of Normalization , 2008, J. Mach. Learn. Res..

[36]  Emmanuel J. Candès,et al.  Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.

[37]  E. Nyström Über Die Praktische Auflösung von Integralgleichungen mit Anwendungen auf Randwertaufgaben , 1930 .

[38]  R. DeVore,et al.  A Simple Proof of the Restricted Isometry Property for Random Matrices , 2008 .

[39]  David G. Lowe,et al.  Shape indexing using approximate nearest-neighbour search in high-dimensional spaces , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[40]  Hongyuan Zha,et al.  Principal Manifolds and Nonlinear Dimension Reduction via Local Tangent Space Alignment , 2002, ArXiv.

[41]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[42]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[43]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, International Conference on Artificial Neural Networks.

[44]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[45]  Thomas L. Ainsworth,et al.  Exploiting manifold geometry in hyperspectral imagery , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[46]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[47]  Ivor W. Tsang,et al.  Improved Nyström low-rank approximation and error analysis , 2008, ICML '08.

[48]  Ameet Talwalkar,et al.  Sampling Techniques for the Nystrom Method , 2009, AISTATS.

[49]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[50]  Rafail Ostrovsky,et al.  Efficient search for approximate nearest neighbor in high dimensional spaces , 1998, STOC '98.

[51]  Anwar M. Ghuloum,et al.  ViewpointFace the inevitable, embrace parallelism , 2009, CACM.

[52]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[53]  Kalyanmoy Deb,et al.  Multi-objective optimization using evolutionary algorithms , 2001, Wiley-Interscience series in systems and optimization.

[54]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[55]  Stéphane Mallat,et al.  Manifold Learning for Latent Variable Inference in Dynamical Systems , 2015, IEEE Transactions on Signal Processing.

[56]  Yuxiao Hu,et al.  Face recognition using Laplacianfaces , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Richard G Baraniuk,et al.  More Is Less: Signal Processing and the Data Deluge , 2011, Science.

[58]  Richard G. Baraniuk,et al.  A new compressive imaging camera architecture using optical-domain compression , 2006, Electronic Imaging.

[59]  Gonzalo Navarro,et al.  Practical Construction of k-Nearest Neighbor Graphs in Metric Spaces , 2006, WEA.

[60]  Timothy Doster Harmonic analysis inspired data fusion for applications in remote sensing , 2014 .

[61]  Yousef Saad,et al.  Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection , 2009, J. Mach. Learn. Res..

[62]  Timothy Doster,et al.  A parametric study of unsupervised anomaly detection performance in maritime imagery using manifold learning techniques , 2016, SPIE Defense + Security.

[63]  Ameet Talwalkar,et al.  Large-scale manifold learning , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[64]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[65]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[66]  Trevor Darrell,et al.  Locality-Sensitive Hashing Using Stable Distributions , 2006 .

[67]  D. L. Donoho,et al.  Compressed sensing , 2006, IEEE Trans. Inf. Theory.

[68]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[69]  Mark Tygert,et al.  A Randomized Algorithm for Principal Component Analysis , 2008, SIAM J. Matrix Anal. Appl..

[71]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[72]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[73]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[74]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[75]  John J. Benedetto,et al.  Integration of heterogeneous data for classification in hyperspectral satellite imagery , 2012, Defense + Commercial Sensing.

[76]  H. Zha,et al.  Principal manifolds and nonlinear dimensionality reduction via tangent space alignment , 2004, SIAM J. Sci. Comput..

[77]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[78]  L. Delves,et al.  Computational methods for integral equations: Frontmatter , 1985 .

[79]  Jon Atli Benediktsson,et al.  Big Data for Remote Sensing: Challenges and Opportunities , 2016, Proceedings of the IEEE.

[80]  Lori M. Bruce,et al.  Why principal component analysis is not an appropriate feature extraction method for hyperspectral data , 2003, IGARSS 2003. 2003 IEEE International Geoscience and Remote Sensing Symposium. Proceedings (IEEE Cat. No.03CH37477).