Probabilistic modelling of general noisy multi-manifold data sets

Abstract The intrinsic nature of noisy and complex data sets is often concealed in low-dimensional structures embedded in a higher dimensional space. Number of methodologies have been developed to extract and represent such structures in the form of manifolds (i.e. geometric structures that locally resemble continuously deformable intervals of R j 1 ). Usually a-priori knowledge of the manifold's intrinsic dimensionality is required. Additionally, their performance can often be hampered by the presence of a significant high-dimensional noise aligned along the low-dimensional core manifold. In real-world applications, the data can contain several low-dimensional structures of different dimensionalities. We propose a framework for dimensionality estimation and reconstruction of multiple noisy manifolds embedded in a noisy environment. To the best of our knowledge, this work represents the first attempt at detection and modelling of a set of coexisting general noisy manifolds by uniting two aspects of multi-manifold learning: the recovery and approximation of core noiseless manifolds and the construction of their probabilistic models. The easy-to-understand hyper-parameters can be manipulated to obtain an emerging picture of the multi-manifold structure of the data. We demonstrate the workings of the framework on two synthetic data sets, presenting challenging features for state-of-the-art techniques in Multi-Manifold learning. The first data set consists of multiple sampled noisy manifolds of different intrinsic dimensionalities, such as Mobius strip, toroid and spiral arm. The second one is a topologically complex set of three interlocked toroids. Given the absence of such unified methodologies in the literature, the comparison with existing techniques is organized along the two separate aspects of our approach mentioned above, namely manifold approximation and probabilistic modelling. The framework is then applied to a complex data set containing simulated gas volume particles from a particle simulation of a dwarf galaxy interacting with its host galaxy cluster. Detailed analysis of the recovered 1D and 2D manifolds can help us to understand the nature of Star Formation in such complex systems.

[1]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[2]  P. Alam ‘A’ , 2021, Composites Engineering: An A–Z Guide.

[3]  Peter Tiño,et al.  Hierarchical GTM: Constructing Localized Nonlinear Projection Manifolds in a Principled Way , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Gebräuchliche Fertigarzneimittel,et al.  V , 1893, Therapielexikon Neurologie.

[5]  Kerstin Bunte,et al.  ASAP - A Sub-sampling Approach for Preserving Topological Structures , 2021, ESANN.

[6]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Mario Parente,et al.  On Clustering and Embedding Mixture Manifolds Using a Low Rank Neighborhood Approach , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[8]  Varun Chandola,et al.  S-Isomap++: Multi manifold learning from streaming data , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[9]  The reliability of [CII] as a star formation rate indicator , 2011, 1106.1643.

[10]  Alessandro Laio,et al.  Data segmentation based on the local intrinsic dimension , 2020, Scientific reports.

[11]  Matthias Zwicker,et al.  Structure-Aware Data Consolidation , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Hongbin Zha,et al.  Riemannian Manifold Learning , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[14]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[15]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[16]  Lei Zhang,et al.  A multi-manifold discriminant analysis method for image feature extraction , 2011, Pattern Recognit..

[17]  LIII , 2018, Out of the Shadow.

[18]  P. Jablonka,et al.  The post-infall evolution of a satellite galaxy , 2015, 1503.05190.

[19]  A. F. Möbius,et al.  Der barycentrische Calcul : ein neues Hülfsmittel zur analytischen Behandlung der Geometrie , 1827 .

[20]  A. Edge,et al.  JELLYFISH: EVIDENCE OF EXTREME RAM-PRESSURE STRIPPING IN MASSIVE GALAXY CLUSTERS , 2013, 1312.6135.

[21]  Mariette Yvinec,et al.  Geometric and Topological Inference , 2018 .

[22]  Peter Tiño,et al.  Fast parzen window density estimator , 2009, 2009 International Joint Conference on Neural Networks.

[23]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[24]  Peter Tiño,et al.  Multiple Manifolds Learning Framework Based on Hierarchical Mixture Density Model , 2008, ECML/PKDD.

[25]  Alfredo Vellido,et al.  Variational Bayesian Generative Topographic Mapping , 2008, J. Math. Model. Algorithms.

[26]  Gérard G. Medioni,et al.  Unsupervised Dimensionality Estimation and Manifold Learning in high-dimensional Spaces by Tensor Voting , 2005, IJCAI.

[27]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[28]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[29]  N. Jachowicz,et al.  New composition-dependent cooling and heating curves for galaxy evolution simulations , 2013, 1306.4860.

[30]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[31]  J. Monaghan,et al.  Smoothed particle hydrodynamics: Theory and application to non-spherical stars , 1977 .

[32]  Davide Bacciu,et al.  Compositional Generative Mapping for Tree-Structured Data—Part II: Topographic Projection Model , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[33]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[34]  J. Lafferty,et al.  Riemannian Geometry and Statistical Machine Learning , 2015 .

[35]  Hongyuan Zha,et al.  Principal Manifolds and Nonlinear Dimension Reduction via Local Tangent Space Alignment , 2002, ArXiv.

[36]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[37]  V. Springel The Cosmological simulation code GADGET-2 , 2005, astro-ph/0505010.

[38]  James F. Peters,et al.  Multi-manifold LLE learning in pattern recognition , 2015, Pattern Recognit..

[39]  Joseph L. Zinnes,et al.  Theory and Methods of Scaling. , 1958 .

[40]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[41]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[42]  Hendra Gunawan,et al.  A formula for angles between subspaces of inner product spaces. , 2005 .

[43]  A. Biviano,et al.  GASP. I. Gas Stripping Phenomena in Galaxies with MUSE , 2017, 1704.05086.

[44]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[45]  R. Peletier,et al.  Tidal origin of NGC 1427A in the Fornax cluster , 2017, 1710.09947.

[46]  M. Paolillo,et al.  Deep ROSAT HRI Observations of the NGC 1399/NGC 1404 Region: Morphology and Structure of the X-Ray Halo , 2001, astro-ph/0109342.

[47]  M. Baes,et al.  The reliability of [C ii] as an indicator of the star formation rate , 2011 .

[48]  G. Lake,et al.  The Structure of Cold Dark Matter Halos , 1998 .

[49]  H. Zha,et al.  Principal manifolds and nonlinear dimensionality reduction via tangent space alignment , 2004, SIAM J. Sci. Comput..

[50]  Christopher M. Bishop,et al.  Developments of the generative topographic mapping , 1998, Neurocomputing.

[51]  K. Dolag,et al.  Metal and molecule cooling in simulations of structure formation , 2007, 0704.2182.

[52]  Pascal Vincent,et al.  Manifold Parzen Windows , 2002, NIPS.

[53]  René Vidal,et al.  Sparse Manifold Clustering and Embedding , 2011, NIPS.

[54]  J. Kruskal Nonmetric multidimensional scaling: A numerical method , 1964 .

[55]  Alessandro Laio,et al.  Estimating the intrinsic dimension of datasets by a minimal neighborhood information , 2017, Scientific Reports.

[56]  Mik Wisniewski,et al.  Applied Regression Analysis: A Research Tool , 1990 .

[57]  Xiaoqin Zhang,et al.  Isometric Multi-manifold Learning for Feature Extraction , 2012, 2012 IEEE 12th International Conference on Data Mining.

[58]  Guillermo Sapiro,et al.  Translated Poisson Mixture Model for Stratification Learning , 2008, International Journal of Computer Vision.

[59]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[60]  Xiaoqin Zhang,et al.  Efficient isometric multi-manifold learning based on the self-organizing method , 2016, Inf. Sci..

[61]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[62]  Joshua B. Tenenbaum,et al.  Global Versus Local Methods in Nonlinear Dimensionality Reduction , 2002, NIPS.

[63]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[64]  D. Calzetti,et al.  [C ii] 158 μm EMISSION AS A STAR FORMATION TRACER , 2014, 1409.7123.