Non-linear dimensionality reduction techniques for classification and visualization

In this paper we address the issue of using local embeddings for data visualization in two and three dimensions, and for classification. We advocate their use on the basis that they provide an efficient mapping procedure from the original dimension of the data, to a lower intrinsic dimension. We depict how they can accurately capture the user's perception of similarity in high-dimensional data for visualization purposes. Moreover, we exploit the low-dimensional mapping provided by these embeddings, to develop new classification techniques, and we show experimentally that the classification accuracy is comparable (albeit using fewer dimensions) to a number of other classification procedures.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[3]  R. Bellman,et al.  V. Adaptive Control Processes , 1964 .

[4]  Jerome H. Friedman,et al.  Flexible Metric Nearest Neighbor Classification , 1994 .

[5]  Toshio Odanaka,et al.  ADAPTIVE CONTROL PROCESSES , 1990 .

[6]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[7]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[8]  Pietro Perona,et al.  Grouping and dimensionality reduction by locally linear embedding , 2001, NIPS.

[9]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[10]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[11]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.

[12]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[13]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[14]  A. Inselberg,et al.  Parallel coordinates for visualizing multi-dimensional geometry , 1987 .

[15]  I. Jolliffe Principal Component Analysis , 2002 .

[16]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Tin Kam Ho,et al.  Nearest Neighbors in Random Subspaces , 1998, SSPR/SPR.

[18]  David G. Lowe,et al.  Similarity Metric Learning for a Variable-Kernel Classifier , 1995, Neural Computation.

[19]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[20]  H. Sebastian Seung,et al.  The Manifold Ways of Perception , 2000, Science.

[21]  Alfred Inselberg,et al.  Parallel coordinates for visualizing multi-dimensional geometry , 1987 .

[22]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[23]  C. Faloutsos,et al.  cient Similarity Search In Sequence Databases , 2004 .

[24]  Alfred Inselberg,et al.  Parallel coordinates: a tool for visualizing multi-dimensional geometry , 1990, Proceedings of the First IEEE Conference on Visualization: Visualization `90.

[25]  Matthew O. Ward,et al.  Animating multidimensional scaling to visualize N-dimensional data sets , 1996, Proceedings IEEE Symposium on Information Visualization '96.

[26]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[27]  Dimitrios Gunopulos,et al.  An Adaptive Metric Machine for Pattern Classification , 2000, NIPS.

[28]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[29]  Richard C. T. Lee,et al.  A Triangulation Method for the Sequential Mapping of Points from N-Space to Two-Space , 1977, IEEE Transactions on Computers.

[30]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[31]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .