Searching for the embedded manifolds in high-dimensional data, problems and unsolved questions

Starting from a recall of several classical - and less classical - remarks about high dimensional data spaces, this paper gives a bird's eye view over various techniques of data reduction, from linear multidimensional scaling to non-linear and non-parametric methods. Two kinds of approaches will be presented, the first one operating in the feature space, the second one operating in the dissimilarity space. A special attention will be devoted to the CCA algorithm, in a version which aims at capturing the mean manifold spanned by the data vectors. Some examples from artificial and real data are given.

[1]  Aleksandra Mojsilovic,et al.  Capturing image semantics with low-level descriptors , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[2]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. I. , 1962 .

[3]  T. Cox,et al.  Multidimensional scaling on a sphere , 1991 .

[4]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[5]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[6]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[7]  Bernhard Schölkopf,et al.  Support vector learning , 1997 .

[8]  David A. Landgrebe,et al.  Hyperspectral Image Data Analysis as a High Dimensional Signal Processing Problem , 2002 .

[9]  Gilles Celeux,et al.  Discriminant Analysis on Dissimilarity Data : a New Fast Gaussian like Algorithm , 2001, AISTATS.

[10]  Michel Verleysen,et al.  Learning high-dimensional data , 2001 .

[11]  P. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 1999 .

[12]  Samuel Kaski,et al.  Learning metrics for exploratory data analysis , 2001, Neural Networks for Signal Processing XI: Proceedings of the 2001 IEEE Signal Processing Society Workshop (IEEE Cat. No.01TH8584).

[13]  A. Dale,et al.  New images from human visual cortex , 1996, Trends in Neurosciences.

[14]  Anne Guérin-Dugué,et al.  Curvilinear Component Analysis for High-Dimensional Data Representation: II. Examples of Additional Mapping Constraints in Specific Applications , 1999, IWANN.

[15]  Gregory Piatetsky-Shapiro,et al.  High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality , 2000 .

[16]  David H. Krantz,et al.  The dimensional representation and the metric structure of similarity data , 1970 .

[17]  Jean-Luc Schwartz,et al.  Models for Audiovisual Fusion in a Noisy-Vowel Recognition Task , 1998, J. VLSI Signal Process..

[18]  Jeanny Hérault,et al.  Curvilinear component analysis: a self-organizing neural network for nonlinear mapping of data sets , 1997, IEEE Trans. Neural Networks.

[19]  Gérard Drouet d'Aubigny L'analyse multidimensionnelle des données de dissimilarité , 1989 .

[20]  Chantal Delon-Martin,et al.  fMRI Retinotopic Mapping—Step by Step , 2002, NeuroImage.

[21]  J. Kruskal Nonmetric multidimensional scaling: A numerical method , 1964 .

[22]  David A. Landgrebe,et al.  Analyzing high-dimensional multispectral data , 1993, IEEE Trans. Geosci. Remote. Sens..

[23]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[24]  Jeanny Hérault,et al.  Curvilinear Component Analysis for High-Dimensional Data Representation: I. Theoretical Aspects and Practical Use in the Presence of Noise , 1999, IWANN.

[25]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. II , 1962 .

[26]  Robert P. W. Duin,et al.  A Generalized Kernel Approach to Dissimilarity-based Classification , 2002, J. Mach. Learn. Res..

[27]  Pierre Demartines Analyse de donnees par reseaux de neurones auto-organises , 1994 .

[28]  Jean-Luc Schwartz,et al.  Models for audiovisual fusion in a noisy-vowel recognition task , 1997, Proceedings of First Signal Processing Society Workshop on Multimedia Signal Processing.

[29]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[30]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .