Selection of the Suitable Parameter Value for ISOMAP

As a promising dimensionality reduction and data visualization technique, ISOMAP is usually used for data preprocessing to avoid “the curse of dimensionality” and select more suitable algorithms or improve the performance of algorithms used in data mining process according to No Free Lunch (NFL) Theorem. ISOMAP has only one parameter, i.e. the neighborhood size, upon which the success of ISOMAP depends greatly. However, it’s an open problem how to select a suitable neighborhood size efficiently. Based on the unique feature of shortcut edges, introduced into the neighborhood graph by using the unsuitable neighborhood size, this paper presents an efficient method to select a suitable neighborhood size according to the decrement of the sum of all the shortest path distances. In contrast with the straightforward method with residual variance, our method only requires running the former part of ISOMAP (shortest path computation) incrementally, which makes it less time-consuming, while yielding the same results. Finally, the feasibility and robustness of this method can be verified by experimental results well.

[1]  Houkuan Huang,et al.  Improvement of Data Visualization Based on SOM , 2004, ISNN.

[2]  Michel Verleysen,et al.  Nonlinear projection with curvilinear distances: Isomap versus curvilinear distance analysis , 2004, Neurocomputing.

[3]  Matthew O. Ward,et al.  Exploring N-dimensional databases , 1990, Proceedings of the First IEEE Conference on Visualization: Visualization `90.

[4]  Matti Pietikäinen,et al.  Efficient Locally Linear Embeddings of Imperfect Manifolds , 2003, MLDM.

[5]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[6]  B. Marx The Visual Display of Quantitative Information , 1985 .

[7]  Li Yang,et al.  K-edge connected neighborhood graph for geodesic distance estimation and nonlinear data projection , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[8]  Vin de Silva,et al.  Unsupervised Learning of Curved Manifolds , 2003 .

[9]  Alfred Inselberg,et al.  Parallel coordinates: a tool for visualizing multi-dimensional geometry , 1990, Proceedings of the First IEEE Conference on Visualization: Visualization `90.

[10]  Daniel A. Keim,et al.  Designing Pixel-Oriented Visualization Techniques: Theory and Applications , 2000, IEEE Trans. Vis. Comput. Graph..

[11]  William S. Cleveland,et al.  Visualizing Data , 1993 .

[12]  Matti Pietikäinen,et al.  Selection of the Optimal Parameter value for the Locally Linear Embedding Algorithm , 2002, FSKD.

[13]  Eser Kandogan,et al.  Visualizing multi-dimensional clusters, trends, and outliers using star coordinates , 2001, KDD '01.

[14]  Li Yang K-edge connected neighborhood graph for geodesic distance estimation and nonlinear data projection , 2004, ICPR 2004.

[15]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[16]  Yu Qian,et al.  Discovering spatial patterns accurately with effective noise removal , 2004, DMKD '04.

[17]  Georges G. Grinstein,et al.  Iconographic Displays For Visualizing Multidimensional Data , 1988, Proceedings of the 1988 IEEE International Conference on Systems, Man, and Cybernetics.

[18]  Joshua B. Tenenbaum,et al.  The Isomap Algorithm and Topological Stability , 2002, Science.

[19]  Dimitrios Gunopulos,et al.  Non-linear dimensionality reduction techniques for classification and visualization , 2002, KDD.

[20]  Anil K. Jain,et al.  Artificial neural networks for feature extraction and multivariate data projection , 1995, IEEE Trans. Neural Networks.

[21]  Alfred Inselberg,et al.  Parallel coordinates for visualizing multi-dimensional geometry , 1987 .

[22]  Antoine Naud INTERACTIVE DATA EXPLORATION USING MDS MAPPING , 2000 .

[23]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[24]  I. Hassan Embedded , 2005, The Cyber Security Handbook.