High-Dimensional Density Estimation for Data Mining Tasks

Consider a problem of estimating an unknown high dimensional density whose support lies on unknown low-dimensional data manifold. This problem arises in many data mining tasks, and the paper proposes a new geometrically motivated solution for the problem in manifold learning framework, including an estimation of an unknown support of the density. Firstly, tangent bundle manifold learning problem is solved resulting in transforming high dimensional data into their low-dimensional features and estimating the Riemannian tensor on the Data manifold. After that, an unknown density of the constructed features is estimated with the use of appropriate kernel approach. Finally, with the use of estimated Riemannian tensor, the final estimator of the initial density is constructed.

[1]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[2]  Harrie Hendriks,et al.  Nonparametric Estimation of a Probability Density on a Riemannian Manifold Using Fourier Expansions , 1990 .

[3]  Alessandro Rozza,et al.  IDEA: Intrinsic Dimension Estimation Algorithm , 2011, ICIAP.

[4]  R Hecht-Nielsen,et al.  Replicator neural networks for universal optimal source coding. , 1995, Science.

[5]  Xavier Pennec,et al.  Intrinsic Statistics on Riemannian Manifolds: Basic Tools for Geometric Measurements , 2006, Journal of Mathematical Imaging and Vision.

[6]  Mohammed J. Zaki Data Mining and Analysis: Fundamental Concepts and Algorithms , 2014 .

[7]  Philip S. Yu,et al.  Mining Colossal Frequent Patterns by Core Pattern Fusion , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[8]  Yoon Tae Kim,et al.  Geometric structures arising from kernel density estimation on Riemannian manifolds , 2013, J. Multivar. Anal..

[9]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[10]  Hongyuan Zha,et al.  Principal Manifolds and Nonlinear Dimension Reduction via Local Tangent Space Alignment , 2002, ArXiv.

[11]  Alexander P. Kuleshov,et al.  Manifold Learning: Generalization Ability and Tangent Proximity , 2013, Int. J. Softw. Informatics.

[12]  Pascal Frossard,et al.  Tangent space estimation for smooth embeddings of Riemannian manifolds , 2012 .

[13]  Alexander P. Kuleshov,et al.  Manifold Learning: Generalization Ability and Tangent Proximity , 2013, Int. J. Softw. Informatics.

[14]  Timothy D. Sauer,et al.  Density estimation on manifolds with boundary , 2015, Comput. Stat. Data Anal..

[15]  Daniela Rodriguez,et al.  Locally adaptive density estimation on Riemannian manifolds , 2013 .

[16]  Alexander G. Gray,et al.  Submanifold density estimation , 2009, NIPS.

[17]  J. Jost Riemannian geometry and geometric analysis , 1995 .

[18]  Kristin P. Bennett,et al.  Density-based indexing for approximate nearest-neighbor queries , 1999, KDD '99.

[19]  Alexander P. Kuleshov,et al.  Tangent Bundle Manifold Learning via Grassmann&Stiefel Eigenmaps , 2012, ArXiv.

[20]  M. Meilă,et al.  Non-linear dimensionality reduction: Riemannian metric estimation and the problem of geometric discovery , 2013, 1305.7255.

[21]  Stephen Smale,et al.  Finding the Homology of Submanifolds with High Confidence from Random Samples , 2008, Discret. Comput. Geom..

[22]  Daniela Rodriguez,et al.  Kernel Density Estimation on Riemannian Manifolds: Asymptotic Results , 2009, Journal of Mathematical Imaging and Vision.

[23]  Dimitrios Gunopulos,et al.  An Efficient Density-based Approach for Data Mining Tasks , 2004, Knowledge and Information Systems.

[24]  Daniel Freedman,et al.  Efficient Simplicial Reconstructions of Manifolds from Their Samples , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[26]  Bo Zhang,et al.  Intrinsic dimension estimation of manifolds by incising balls , 2009, Pattern Recognit..

[27]  Charu C. Aggarwal,et al.  Data Mining: The Textbook , 2015 .

[28]  Daniel D. Lee,et al.  Grassmann discriminant analysis: a unifying view on subspace-based learning , 2008, ICML '08.

[29]  Ira Assent,et al.  DensEst: Density Estimation for Data Mining in High Dimensional Spaces , 2009, SDM.

[30]  Larry A. Wasserman,et al.  Minimax Manifold Estimation , 2010, J. Mach. Learn. Res..

[31]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[32]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[33]  Lior Wolf,et al.  Learning over Sets using Kernel Principal Angles , 2003, J. Mach. Learn. Res..

[34]  Hans-Peter Kriegel,et al.  A generic framework for efficient subspace clustering of high-dimensional data , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[35]  X. Huo,et al.  A Survey of Manifold-Based Learning Methods , 2007 .

[36]  Hong Qiao,et al.  Intrinsic dimension estimation of data by principal component analysis , 2010, ArXiv.

[37]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[38]  Yury Yanovich Asymptotic Properties of Nonparametric Estimation on Manifold , 2017, COPA.

[39]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[40]  Xavier Pennec,et al.  Probabilities and statistics on Riemannian manifolds: Basic tools for geometric measurements , 1999, NSIP.

[41]  Peter J. Bickel,et al.  Maximum Likelihood Estimation of Intrinsic Dimension , 2004, NIPS.

[42]  Liwei Wang,et al.  Subspace distance analysis with application to adaptive Bayesian algorithm for face recognition , 2006, Pattern Recognit..

[43]  O. Hellwich,et al.  A Projection and Density Estimation Method for Knowledge Discovery , 2012, PloS one.

[44]  Hyun Suk Park Asymptotic Behavior of the Kernel Density Estimator from a Geometric Viewpoint , 2012 .

[45]  H. Sebastian Seung,et al.  The Manifold Ways of Perception , 2000, Science.

[46]  Bruno Pelletier Kernel density estimation on Riemannian manifolds , 2005 .

[47]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[48]  Michel Verleysen,et al.  Quality assessment of dimensionality reduction: Rank-based criteria , 2009, Neurocomputing.

[49]  John M. Lee Manifolds and Differential Geometry , 2009 .

[50]  P. Campadelli,et al.  Intrinsic Dimension Estimation: Relevant Techniques and a Benchmark Framework , 2015 .

[51]  Y. Yanovich Asymptotic Properties of Local Sampling on Manifold , 2016 .

[52]  T. Wagner,et al.  Nonparametric estimates of probability densities , 1975, IEEE Trans. Inf. Theory.

[53]  H. Zha,et al.  Principal manifolds and nonlinear dimensionality reduction via tangent space alignment , 2004, SIAM J. Sci. Comput..

[54]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[55]  A. Singer,et al.  Vector diffusion maps and the connection Laplacian , 2011, Communications on pure and applied mathematics.

[56]  David W. Scott,et al.  Multivariate Density Estimation and Visualization , 2012 .