The Isomap Algorithm and Topological Stability

Topological Stability Tenenbaum et al. (1) presented an algorithm, Isomap, for computing a quasi-isometric, low-dimensional embedding of a set of high-dimensional data points. Two issues need to be raised concerning this work. First, the basic approach presented by Tenenbaum et al. is not new, having been described in the context of flattening cortical surfaces using geodesic distances and multidimensional scaling (2, 3) and in later work that used Dijkstra’s algorithm to approximate geodesic distances (4, 5). These ideas generalize to arbitrary dimensionality if the connectivity and metric information of the manifold are correctly supplied. Second, and more important, this approach is topologically unstable and can only be used after careful preprocessing of the data (6). In the application domain of cortical flattening, it is necessary to check manually for connectivity errors, so that points nearby in 3-space (for example, on opposite banks of a cortical sulcus) are not taken to be nearby in the cortical surface. If such care is taken, this method represents the preferred method for quasi-isometric cortical flattening. What is new about the Isomap algorithm is how it defines the connectivity of each data point via its nearest Euclidean neighbors in the high-dimensional space. This step is vulnerable to short-circuit errors if the neighborhood is too large with respect to folds in the manifold on which the data points lie or if noise in the data moves the points slightly off the manifold. Even a single short-circuit error can alter many entries in the geodesic distance matrix, which in turn can lead to a drastically different (and incorrect) low-dimensional embedding. We illustrate this failure in Fig. 1, using the MATLAB code published by Tenenbaum et al. along with their “Swiss roll” data, to which we have added a small amount of noise (7). Clearly, the algorithm is topologically unstable: Small errors in the data connectivity (topology) can lead to large errors in the solution. Choosing a very small neighborhood is not a satisfactory solution, as this can fragment the manifold into a large number of disconnected regions. Choosing the neighborhood “just right” requires a priori information about the global geometry of the highdimensional data manifold (8), but, presumably, it is exactly in the absence of such information that one would need to use an algorithm to find “meaningful low-dimensional structures hidden in high-dimensional observations” (1). In summary, the basic idea of Isomap has long been known, and the new component introduced by Tenenbaum et al. provides an unreliable estimate of surface connectivity, which can lead to failure of the algorithm to perform as claimed. Mukund Balasubramanian Eric L. Schwartz Department of Cognitive and Neural Systems and Department of Electrical and Computer Systems Boston University Boston, MA 02215, USA