An Intrinsic Dimensionality Estimator from Near-Neighbor Information

The intrinsic dimensionality of a set of patterns is important in determining an appropriate number of features for representing the data and whether a reasonable two- or three-dimensional representation of the data exists. We propose an intuitively appealing, noniterative estimator for intrinsic dimensionality which is based on nearneighbor information. We give plausible arguments supporting the consistency of this estimator. The method works well in identifying the true dimensionality for a variety of artificial data sets and is fairly insensitive to the number of samples and to the algorithmic parameters. Comparisons between this new method and the global eigenvalue approach demonstrate the utility of our estimator.

[1]  Mervin E. Muller,et al.  A note on a method for generating points uniformly on n-dimensional spheres , 1959, CACM.

[2]  Forest Baskett,et al.  An Algorithm for Finding Nearest Neighbors , 1975, IEEE Transactions on Computers.

[3]  Thomas P. Yunck,et al.  A Technique to Identify Nearest Neighbors , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[4]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[5]  Richard C. T. Lee,et al.  A Triangulation Method for the Sequential Mapping of Points from N-Space to Two-Space , 1977, IEEE Transactions on Computers.

[6]  J. B. Kruskal,et al.  A geometric interpretation of diagnostic data from a digital machine: Based on a study of the morris, illinois electronic central office , 1966 .

[7]  Joseph B. Kruskal Comments on "A Nonlinear Mapping for Data Structure Analysis" , 1971, IEEE Trans. Computers.

[8]  Keinosuke Fukunaga,et al.  An Algorithm for Finding Intrinsic Dimensionality of Data , 1971, IEEE Transactions on Computers.

[9]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. I. , 1962 .

[10]  Jacques J. Vidal,et al.  An Algorithm for Determining the Topological Dimensionality of Point Clusters , 1975, IEEE Transactions on Computers.

[11]  Richard C. T. Lee,et al.  A Heuristic Relaxation Method for Nonlinear Mapping in Cluster Analysis , 1973, IEEE Trans. Syst. Man Cybern..

[12]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. II , 1962 .

[13]  Thomas W. Calvert,et al.  Nonorthogonal Projections for Feature Extraction in Pattern Recognition , 1969, IEEE Transactions on Computers.

[14]  J. Kruskal Nonmetric multidimensional scaling: A numerical method , 1964 .

[15]  Gerard V. Trunk,et al.  Stastical Estimation of the Intrinsic Dimensionality of a Noisy Signal Collection , 1976, IEEE Transactions on Computers.

[16]  Geoffrey H. Ball,et al.  Data analysis in the social sciences: what about the details? , 1965, AFIPS '65 (Fall, part I).

[17]  Robert S. Bennett,et al.  The intrinsic dimensionality of signal collections , 1969, IEEE Trans. Inf. Theory.

[18]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[19]  Ray A. Jarvis,et al.  Clustering Using a Similarity Measure Based on Shared Near Neighbors , 1973, IEEE Transactions on Computers.

[20]  R. Shepard Representation of structure in similarity data: Problems and prospects , 1974 .

[21]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[22]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[23]  Harry C. Andrews,et al.  Nonlinear Intrinsic Dimensionality Computations , 1974, IEEE Transactions on Computers.