Fast Multidimensional Scaling Through Sampling, Springs and Interpolation

The term ‘proximity data’ refers to data sets within which it is possible to assess the similarity of pairs of objects. Multidimensional scaling (MDS) is applied to such data and attempts to map high-dimensional objects onto low-dimensional space through the preservation of these similarity relations. Standard MDS techniques have in the past suffered from high computational complexity and, as such, could not feasibly be applied to data sets over a few thousand objects in size. Through a novel hybrid approach based upon stochastic sampling, interpolation and spring models, we have designed an algorithm running in O(N√N). Using Chalmers’ 1996 O(N2) spring model as a benchmark for the evaluation of our technique, we compare layout quality and run times using sets of synthetic and real data. Our algorithm executes significantly faster than Chalmers’ 1996 algorithm, while producing superior layouts. In reducing complexity and run time, we allow the visualisation of data sets of previously infeasible size. Our results indicate that our method is a solid foundation for interactive and visual exploration of data.

[1]  Mu-Chun Su,et al.  Fast self-organizing feature map algorithm , 2000, IEEE Trans. Neural Networks Learn. Syst..

[2]  Samuel Kaski,et al.  Self organization of a massive document collection , 2000, IEEE Trans. Neural Networks Learn. Syst..

[3]  Matthew Chalmers,et al.  A linear iteration time layout algorithm for visualising high-dimensional data , 1996, Proceedings of Seventh Annual IEEE Visualization '96.

[4]  Wojciech Basalaj Proximity visualisation of abstract data , 2001 .

[5]  James J. Thomas,et al.  Visualizing the non-visual: spatial analysis and interaction with information from text documents , 1995, Proceedings of Visualization 1995 Conference.

[6]  Andreas Buja,et al.  Interactive High-Dimensional Data Visualization , 1996 .

[7]  Matthew Chalmers,et al.  Combining and comparing clustering and layout algorithms , 2001 .

[8]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[9]  Michael L. Littman,et al.  XGvis: Interactive Data Visualization with Multidimensional Scaling , 1998 .

[10]  Peter Eades,et al.  A Heuristic for Graph Drawing , 1984 .

[11]  Steven K. Feiner,et al.  Worlds within worlds: metaphors for exploring n-dimensional virtual worlds , 1990, UIST '90.

[12]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[13]  Shokri Z. Selim,et al.  K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  A. J. Collins,et al.  Introduction To Multivariate Analysis , 1981 .

[15]  Robert McGill,et al.  The Many Faces of a Scatterplot , 1984 .

[16]  Kerry Rodden,et al.  Does organisation by similarity assist image browsing? , 2001, CHI.

[17]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[18]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[19]  Robert Spence,et al.  Externalising abstract mathematical models , 1996, CHI '96.

[20]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[21]  Michael L. Littman,et al.  Visualizing the embedding of objects in Euclidean space , 1992 .

[22]  Dominique Brodbeck,et al.  Combining topological clustering and multidimensional scaling for visualizing large data sets , 1998 .

[23]  Gary Marchionini,et al.  A self-organizing semantic map for information retrieval , 1991, SIGIR '91.

[24]  Gonzalo Navarro,et al.  Fixed Queries Array: A Fast and Economical Data Structure for Proximity Searching , 2001, Multimedia Tools and Applications.

[25]  Matthew Chalmers,et al.  A hybrid layout algorithm for sub-quadratic multidimensional scaling , 2002, IEEE Symposium on Information Visualization, 2002. INFOVIS 2002..