On Local Intrinsic Dimension Estimation and Its Applications

In this paper, we present multiple novel applications for local intrinsic dimension estimation. There has been much work done on estimating the global dimension of a data set, typically for the purposes of dimensionality reduction. We show that by estimating dimension locally, we are able to extend the uses of dimension estimation to many applications, which are not possible with global dimension estimation. Additionally, we show that local dimension estimation can be used to obtain a better global dimension estimate, alleviating the negative bias that is common to all known dimension estimation algorithms. We illustrate local dimension estimation's uses towards additional applications, such as learning on statistical manifolds, network anomaly detection, clustering, and image segmentation.

[1]  Alfred O. Hero,et al.  Fisher Information Nonparametric Embedding , 2009 .

[2]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[3]  Alfred O. Hero,et al.  Dual Rooted-Diffusions for Clustering and Classification on Manifolds , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  Peter J. Bickel,et al.  Maximum Likelihood Estimation of Intrinsic Dimension , 2004, NIPS.

[5]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[6]  Alfred O. Hero,et al.  Variance reduction with neighborhood smoothing for local intrinsic dimension estimation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Alfred O. Hero,et al.  FINE: Fisher Information Nonparametric Embedding , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Alan David Hutson,et al.  Resampling Methods for Dependent Data , 2004, Technometrics.

[9]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[10]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[11]  Bidyut Baran Chaudhuri,et al.  Texture Segmentation Using Fractal Dimension , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  V. Koltchinskii Empirical geometry of multivariate data: a deconvolution approach , 2000 .

[13]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[14]  Hamid Krim,et al.  Statistics and Analysis of Shapes , 2006, Modeling and Simulation in Science, Engineering and Technology.

[15]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[16]  Svetlana Lazebnik,et al.  Estimation of Intrinsic Dimensionality Using High-Rate Vector Quantization , 2005, NIPS.

[17]  Alfred O. Hero,et al.  Learning Intrinsic Dimension and Entropy of High-Dimensional Shape Spaces , 2005 .

[18]  Alfred O. Hero,et al.  Geodesic entropic graphs for dimension and entropy estimation in manifold learning , 2004, IEEE Transactions on Signal Processing.

[19]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[20]  Alfred O. Hero,et al.  Determining Intrinsic Dimension and Entropy of High-Dimensional Shape Spaces , 2006, Statistics and Analysis of Shapes.

[21]  Keinosuke Fukunaga,et al.  An Algorithm for Finding Intrinsic Dimensionality of Data , 1971, IEEE Transactions on Computers.

[22]  Matthias Hein Intrinsic Dimensionality Estimation of Submanifolds in R , 2005 .

[23]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[24]  Kevin M. Carter,et al.  Dimensionality reduction on statistical manifolds , 2009 .

[25]  Balázs Kégl,et al.  Intrinsic Dimension Estimation Using Packing Numbers , 2002, NIPS.

[26]  Raviv Raich,et al.  Analysis of clinical flow cytometric immunophenotyping data by clustering on statistical manifolds: Treating flow cytometry data as high‐dimensional objects , 2009, Cytometry. Part B, Clinical cytometry.

[27]  Benoit B. Mandelbrot,et al.  Fractal Geometry of Nature , 1984 .

[28]  Cun-Hui Zhang,et al.  The multivariate L1-median and associated data depth. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Francesco Camastra,et al.  Data dimensionality estimation methods: a survey , 2003, Pattern Recognit..

[30]  A. Hero,et al.  De-Biasing for Intrinsic Dimension Estimation , 2007, 2007 IEEE/SP 14th Workshop on Statistical Signal Processing.

[31]  Matthias Hein,et al.  Intrinsic dimensionality estimation of submanifolds in Rd , 2005, ICML.

[32]  Matthias Hein,et al.  Intrinsic Dimensionality Estimation of Submanifolds in Euclidean space , 2005, ICML 2005.

[33]  Alfred O. Hero,et al.  Information Preserving Component Analysis: Data Projections for Flow Cytometry Analysis , 2008, IEEE Journal of Selected Topics in Signal Processing.

[34]  R. Kass,et al.  Geometrical Foundations of Asymptotic Inference , 1997 .

[35]  Aristides Gionis,et al.  What is the Dimension of Your Binary Data? , 2006, Sixth International Conference on Data Mining (ICDM'06).

[36]  Alex Pentland,et al.  Fractal-Based Description of Natural Scenes , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Vladimir Pestov,et al.  An axiomatic approach to intrinsic dimension of a dataset , 2007, Neural Networks.