Heuristic Framework for Multi-Scale Testing of the Multi-Manifold Hypothesis

When analyzing empirical data, we often find that global linear models overestimate the number of parameters required. In such cases, we may ask whether the data lies on or near a manifold or a set of manifolds, referred to as multi-manifold, of lower dimension than the ambient space. This question can be phrased as a (multi-)manifold hypothesis. The identification of such intrinsic multiscale features is a cornerstone of data analysis and representation, and has given rise to a large body of work on manifold learning. In this work, we review key results on multiscale data analysis and intrinsic dimension followed by the introduction of a heuristic, multiscale, framework for testing the multi-manifold hypothesis. Our method implements a hypothesis test on a set of spline-interpolated manifolds constructed from variance-based intrinsic dimensions. The workflow is suitable for empirical data analysis as we demonstrate on two use cases.

[1]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[2]  Peter W. Jones Rectifiable sets and the Traveling Salesman Problem , 1990 .

[3]  Peter J. Bickel,et al.  Maximum Likelihood Estimation of Intrinsic Dimension , 2004, NIPS.

[4]  Marina Meila,et al.  Megaman: Scalable Manifold Learning in Python , 2016, J. Mach. Learn. Res..

[5]  Floris Takens,et al.  On the numerical determination of the dimension of an attractor , 1985 .

[6]  Alfred O. Hero,et al.  Variance reduction with neighborhood smoothing for local intrinsic dimension estimation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Xu Wang,et al.  Riemannian Multi-Manifold Modeling , 2014, ArXiv.

[8]  Guangliang Chen,et al.  Spectral clustering based on local linear approximations , 2010, 1001.1323.

[9]  Dimitri Lague,et al.  3D Terrestrial LiDAR data classification of complex natural scenes using a multi-scale dimensionality criterion: applications in geomorphology , 2011, ArXiv.

[10]  Paul Bendich,et al.  Scaffoldings and Spines: Organizing High-Dimensional Data Using Cover Trees, Local Principal Component Analysis, and Persistent Homology , 2016, ArXiv.

[11]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, International Conference on Artificial Neural Networks.

[12]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[13]  J. Shan,et al.  Topographic laser ranging and scanning : principles and processing , 2008 .

[14]  David Shallcross,et al.  Application of multi-scale singular vector decomposition to vessel classification in overhead satellite imagery , 2015, Digital Image Processing.

[15]  Guangliang Chen,et al.  Multi-Resolution Geometric Analysis for Data in High Dimensions , 2013 .

[16]  Cecilia Clementi,et al.  Rapid exploration of configuration space with diffusion-map-directed molecular dynamics. , 2013, The journal of physical chemistry. B.

[17]  Rauf Izmailov,et al.  Multi-scale local shape analysis and feature selection in machine learning applications , 2014, 2015 International Joint Conference on Neural Networks (IJCNN).

[18]  S. Mitter,et al.  Testing the Manifold Hypothesis , 2013, 1310.0425.

[19]  Jose A. Costa,et al.  Estimating Local Intrinsic Dimension with k-Nearest Neighbor Graphs , 2005, IEEE/SP 13th Workshop on Statistical Signal Processing, 2005.

[20]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Ronald R. Coifman,et al.  Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators , 2005, NIPS.

[22]  W. Marsden I and J , 2012 .

[23]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[24]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[25]  Guillermo Sapiro,et al.  Translated Poisson Mixture Model for Stratification Learning , 2008, International Journal of Computer Vision.

[26]  John D. Chodera,et al.  Long-Time Protein Folding Dynamics from Short-Time Molecular Dynamics Simulations , 2006, Multiscale Model. Simul..

[27]  Marina Meila,et al.  Improved Graph Laplacian via Geometric Self-Consistency , 2014, NIPS.

[28]  Ronald R. Coifman,et al.  Diffusion Maps, Reduction Coordinates, and Low Dimensional Representation of Stochastic Systems , 2008, Multiscale Model. Simul..

[29]  Alfred O. Hero,et al.  Geodesic entropic graphs for dimension and entropy estimation in manifold learning , 2004, IEEE Transactions on Signal Processing.

[30]  Andrew L. Ferguson,et al.  Nonlinear reconstruction of single-molecule free-energy surfaces from univariate time series. , 2016, Physical review. E.

[31]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[32]  A. Little Estimating the Intrinsic Dimension of High-Dimensional Data Sets: A Multiscale, Geometric Approach , 2011 .

[33]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Francesco Camastra,et al.  Data dimensionality estimation methods: a survey , 2003, Pattern Recognit..

[35]  Aurélien Garivier,et al.  On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[36]  David Shallcross,et al.  Centralized multi-scale singular value decomposition for feature construction in LIDAR image classification problems , 2012, 2012 IEEE Applied Imagery Pattern Recognition Workshop (AIPR).

[37]  Keinosuke Fukunaga 15 Intrinsic dimensionality extraction , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[38]  Anil K. Jain,et al.  An Intrinsic Dimensionality Estimator from Near-Neighbor Information , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  P. Grassberger,et al.  Measuring the Strangeness of Strange Attractors , 1983 .

[40]  M. Maggioni,et al.  Determination of reaction coordinates via locally scaled diffusion map. , 2011, The Journal of chemical physics.

[41]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[42]  Hariharan Narayanan,et al.  Sample Complexity of Testing the Manifold Hypothesis , 2010, NIPS.

[43]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[44]  S. Semmes,et al.  Quantitative rectifiability and Lipschitz mappings , 1993 .

[45]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[46]  David R. Karger,et al.  Finding nearest neighbors in growth-restricted metrics , 2002, STOC '02.

[47]  R. Schul,et al.  An analyst’s traveling salesman theorem for sets of dimension larger than one , 2016, Mathematische Annalen.

[48]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[49]  James Theiler,et al.  Testing for nonlinearity in time series: the method of surrogate data , 1992 .

[50]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[51]  B. Nadler,et al.  Diffusion maps, spectral clustering and reaction coordinates of dynamical systems , 2005, math/0503445.

[52]  Paul M. Mather,et al.  Computer Processing of Remotely-Sensed Images: An Introduction , 1988 .

[53]  Robert Krauthgamer,et al.  Navigating nets: simple algorithms for proximity search , 2004, SODA '04.

[54]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[55]  Bernhard Schölkopf,et al.  A kernel view of the dimensionality reduction of manifolds , 2004, ICML.

[56]  Tsuyoshi Murata,et al.  {m , 1934, ACML.