Practical Considerations on Nonparametric Methods for Estimating Intrinsic Dimensions of Nonlinear Data Structures

This paper develops readily applicable methods for estimating the intrinsic dimension of multivariate datasets. The proposed methods, which make use of theoretical properties of the empirical distribution functions of (pairwise or pointwise) distances, build on the existing concepts of (i) correlation dimensions and (ii) charting manifolds that are contrasted with (iii) a maximum likelihood technique and (iv) other recently proposed geometric methods including MiND and IDEA. This comparison relies on application studies involving simulated examples, a recorded dataset from a glucose processing facility, as well as several benchmark datasets available from the literature. The performance of the proposed techniques is generally in line with other dimension estimators, speci¯cally noting that the correlation dimension variants perform favorably to the maximum likelihood method in terms of accuracy and computational e±ciency.

[1]  Hong Qiao,et al.  Intrinsic dimension estimation of data by principal component analysis , 2010, ArXiv.

[2]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[3]  T. Hastie,et al.  Local Regression: Automatic Kernel Carpentry , 1993 .

[4]  E. Lima,et al.  A unified statistical framework for monitoring multivariate systems with unknown source and error signals , 2010 .

[5]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[6]  StaianoAntonino,et al.  Intrinsic dimension estimation , 2016 .

[7]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[8]  Keinosuke Fukunaga,et al.  An Algorithm for Finding Intrinsic Dimensionality of Data , 1971, IEEE Transactions on Computers.

[9]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[10]  Junhong Li,et al.  Improved kernel principal component analysis for fault detection , 2008, Expert Syst. Appl..

[11]  S. Qin,et al.  Selection of the Number of Principal Components: The Variance of the Reconstruction Error Criterion with a Comparison to Other Methods† , 1999 .

[12]  S. Tu,et al.  Random distance distribution for spherical objects: general theory and applications to physics , 2002, math-ph/0201046.

[13]  Alessandro Rozza,et al.  Novel high intrinsic dimensionality estimators , 2012, Machine Learning.

[14]  Alessandro Rozza,et al.  DANCo: An intrinsic dimensionality estimator exploiting angle and norm concentration , 2014, Pattern Recognit..

[15]  Zhang Yi,et al.  A Local Non-Negative Pursuit Method for Intrinsic Manifold Structure Preservation , 2014, AAAI.

[16]  Lixin Ding,et al.  Intrinsic dimensionality estimation based on manifold assumption , 2014, J. Vis. Commun. Image Represent..

[17]  Christina Mastrangelo Statistical Monitoring of Complex Multivariate Processes with Applications in Industrial Process Control , 2013 .

[18]  W. Velicer Determining the number of components from the matrix of partial correlations , 1976 .

[19]  S. Qin,et al.  Determining the number of principal components for best reconstruction , 2000 .

[20]  D. Ruelle,et al.  Fundamental limitations for estimating dimensions and Lyapunov exponents in dynamical systems , 1992 .

[21]  Floris Takens,et al.  On the numerical determination of the dimension of an attractor , 1985 .

[22]  Lei Xie,et al.  Developments and Applications of Nonlinear Principal Component Analysis – a Review , 2008 .

[23]  Thomas Martinetz,et al.  Topology representing networks , 1994, Neural Networks.

[24]  V. Alagar The distribution of the distance between random points , 1976, Journal of Applied Probability.

[25]  M. Stone,et al.  Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[26]  James Theiler,et al.  Estimating fractal dimension , 1990 .

[27]  George W. Irwin,et al.  Fault diagnosis in internal combustion engines using non-linear multivariate statistics , 2005 .

[28]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[29]  P. Campadelli,et al.  Intrinsic Dimension Estimation: Relevant Techniques and a Benchmark Framework , 2015 .

[30]  Lei Xie,et al.  Statistical‐based monitoring of multivariate non‐Gaussian systems , 2008 .

[31]  Xun Wang,et al.  Nonlinear PCA With the Local Approach for Diesel Engine Fault Detection and Diagnosis , 2008, IEEE Transactions on Control Systems Technology.

[32]  Gerald Sommer,et al.  Topology Representing Networks for Intrinsic Dimensionality Estimation , 1997, ICANN.

[33]  Matthias Hein,et al.  Intrinsic dimensionality estimation of submanifolds in Rd , 2005, ICML.

[34]  Francesco Camastra,et al.  Data dimensionality estimation methods: a survey , 2003, Pattern Recognit..

[35]  James Taylor Strategies for mean and modal multivariate local regression , 2012 .

[36]  Andrei Zinovyev,et al.  Principal Manifolds for Data Visualization and Dimension Reduction , 2007 .

[37]  Uwe Kruger,et al.  Statistical monitoring of complex multivariate processes : with applications in industrial process control , 2012 .

[38]  Alessandro Rozza,et al.  Minimum Neighbor Distance Estimators of Intrinsic Dimension , 2011, ECML/PKDD.

[39]  Antonino Staiano,et al.  Intrinsic dimension estimation: Advances and open problems , 2016, Inf. Sci..

[40]  Jochen Einbeck,et al.  Intrinsic Dimensionality Estimation for High-dimensional Data Sets: New Approaches for the Computation of Correlation Dimension , 2013 .

[41]  A. Höskuldsson H‐methods in applied sciences , 2008 .

[42]  Peter Dalgaard,et al.  R Development Core Team (2010): R: A language and environment for statistical computing , 2010 .

[43]  Robert P. W. Duin,et al.  An Evaluation of Intrinsic Dimensionality Estimators , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[44]  E. C. Malthouse,et al.  Limitations of nonlinear PCA as performed with generic neural networks , 1998, IEEE Trans. Neural Networks.

[45]  Zhiqiang Ge,et al.  Local ICA for multivariate statistical fault diagnosis in systems with unknown signal and error distributions , 2012 .

[46]  P. Grassberger,et al.  Measuring the Strangeness of Strange Attractors , 1983 .

[47]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[48]  Pierre-Antoine Absil,et al.  Principal Manifolds for Data Visualization and Dimension Reduction , 2007 .

[49]  R. Lord The Distribution of Distance in a Hypersphere , 1954 .

[50]  Maurizio Filippone,et al.  A comparative evaluation of nonlinear dynamics methods for time series prediction , 2009, Neural Computing and Applications.