Using Betweenness Centrality to Identify Manifold Shortcuts

High-dimensional data presents a significant challenge to a broad spectrum of pattern recognition and machine-learning applications. Dimensionality reduction (DR) methods serve to remove unwanted variance and make such problems tractable. Several nonlinear DR methods, such as the well known ISOMAP algorithm, rely on a neighborhood graph to compute geodesic distances between data points. These graphs may sometimes contain unwanted edges which connect disparate regions of one or more manifolds. This topological sensitivity is well known, yet managing high-dimensional, noisy data in the absence of a priori knowledge, remains an open and difficult problem. This manuscript introduces a divisive, edge-removal method based on graph betweenness centrality which can robustly identify manifold-shorting edges. The problem of graph construction in high dimensions is discussed and the proposed algorithm is inserted into the ISOMAP workflow. ROC analysis is performed and the performance is tested on both synthetic and real datasets.

[1]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[2]  Miguel Á. Carreira-Perpiñán,et al.  Proximity Graphs for Clustering and Manifold Learning , 2004, NIPS.

[3]  David G. Stork,et al.  Pattern Classification , 1973 .

[4]  George Lee,et al.  An Empirical Comparison of Dimensionality Reduction Methods for Classifying Gene and Protein Expression Datasets , 2007, ISBRA.

[5]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  I. Hassan Embedded , 2005, The Cyber Security Handbook.

[7]  Heeyoul Choi,et al.  Robust kernel Isomap , 2007, Pattern Recognit..

[8]  Lin Yang,et al.  High Throughput Analysis of Breast Cancer Specimens on the Grid , 2007, MICCAI.

[9]  Anil K. Jain,et al.  Nonlinear Manifold Learning for Data Stream , 2004, SDM.

[10]  Mukund Balasubramanian,et al.  The Isomap Algorithm and Topological Stability , 2002, Science.

[11]  Daniel D. Lee,et al.  Learning High Dimensional Correspondences from Low Dimensional Manifolds , 2003 .

[12]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[13]  Matthias Hein,et al.  Manifold Denoising , 2006, NIPS.

[14]  Joshua B. Tenenbaum,et al.  Global Versus Local Methods in Nonlinear Dimensionality Reduction , 2002, NIPS.

[15]  Kilian Q. Weinberger,et al.  Spectral Methods for Dimensionality Reduction , 2006, Semi-Supervised Learning.

[16]  Fredrik Andersson,et al.  A circuit framework for robust manifold learning , 2007, Neurocomputing.

[17]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.