Intrinsic dimension estimation of data by principal component analysis

Estimating intrinsic dimensionality of data is a classic problem in pattern recognition and statistics. Principal Component Analysis (PCA) is a powerful tool in discovering dimensionality of data sets with a linear structure; it, however, becomes ineffective when data have a nonlinear structure. In this paper, we propose a new PCA-based method to estimate intrinsic dimension of data with nonlinear structures. Our method works by first finding a minimal cover of the data set, then performing PCA locally on each subset in the cover and finally giving the estimation result by checking up the data variance on all small neighborhood regions. The proposed method utilizes the whole data set to estimate its intrinsic dimension and is convenient for incremental learning. In addition, our new PCA procedure can filter out noise in data and converge to a stable estimation with the neighborhood region size increasing. Experiments on synthetic and real world data sets show effectiveness of the proposed method.

[1]  Robert P. W. Duin,et al.  An Evaluation of Intrinsic Dimensionality Estimators , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Siu-Wing Cheng,et al.  Dimension detection via slivers , 2009, SODA.

[3]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[4]  Thorsten M. Buzug,et al.  Characterising experimental time series using local intrinsic dimension , 1995 .

[5]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[6]  H. Sebastian Seung,et al.  The Manifold Ways of Perception , 2000, Science.

[7]  Anil K. Jain,et al.  An Intrinsic Dimensionality Estimator from Near-Neighbor Information , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Csaba Szepesvári,et al.  Manifold-Adaptive Dimension Estimation , 2007, ICML '07.

[9]  Svetlana Lazebnik,et al.  Estimation of Intrinsic Dimensionality Using High-Rate Vector Quantization , 2005, NIPS.

[10]  Matthias Hein,et al.  Intrinsic dimensionality estimation of submanifolds in Rd , 2005, ICML.

[11]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[12]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[13]  Gerald Sommer,et al.  Intrinsic Dimensionality Estimation With Optimally Topology Preserving Maps , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  P. Grassberger,et al.  Measuring the Strangeness of Strange Attractors , 1983 .

[15]  Yajun Wang,et al.  Provable Dimension Detection Using Principal Component Analysis , 2008, Int. J. Comput. Geom. Appl..

[16]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[17]  Peter J. Bickel,et al.  Maximum Likelihood Estimation of Intrinsic Dimension , 2004, NIPS.

[18]  J. Rogers Chaos , 1876 .

[19]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[20]  Bo Zhang,et al.  Intrinsic dimension estimation of manifolds by incising balls , 2009, Pattern Recognit..

[21]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[22]  Francesco Camastra,et al.  Data dimensionality estimation methods: a survey , 2003, Pattern Recognit..

[23]  Hongbin Zha,et al.  Riemannian Manifold Learning , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[25]  T. Hastie,et al.  Principal Curves , 2007 .

[26]  Alfred O. Hero,et al.  Geodesic entropic graphs for dimension and entropy estimation in manifold learning , 2004, IEEE Transactions on Signal Processing.

[27]  Keinosuke Fukunaga,et al.  An Algorithm for Finding Intrinsic Dimensionality of Data , 1971, IEEE Transactions on Computers.

[28]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[29]  S. Chatterjee,et al.  Chaos, Fractals and Statistics , 1992 .

[30]  Jose A. Costa,et al.  Estimating Local Intrinsic Dimension with k-Nearest Neighbor Graphs , 2005, IEEE/SP 13th Workshop on Statistical Signal Processing, 2005.

[31]  Balázs Kégl,et al.  Intrinsic Dimension Estimation Using Packing Numbers , 2002, NIPS.