Improving hybrid MDS with pivot-based searching

An algorithm is presented for the visualization of multidimensional abstract data, building on a hybrid model introduced at Info Vis 2002. The most computationally complex stage of the original model involved performing a nearest-neighbour search for every data item. The complexity of this phase has been reduced by treating all high-dimensional relationships as a set of discretised distances to a constant number of randomly selected pivot items. In improving this computational bottleneck, the complexity is reduced to from O(N/sub 1/2 /N) to O(N/sub 5/4/). As well as documenting this improvement, the paper describes evaluation with a data set of 108000 14-dimensional items; a considerable increase on the size of data previously tested. Results illustrate that the reduction in complexity is reflected in significantly improved run times and that no negative impact is made upon the quality of layout produced.

[1]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[2]  Proceedings of the IEEE Symposium on Information Visualization 1996, InfoVis '96, San Francisco, CA, USA, October 28-29, 1996 , 1996, IEEE Information Visualization Conference.

[3]  David M. Mount,et al.  On the Efficiency of Nearest Neighbor Searching with Data Clustered in Lower Dimensions , 2001, International Conference on Computational Science.

[4]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[5]  Andreas Ludwig,et al.  A Fast Adaptive Layout Algorithm for Undirected Graphs , 1994, GD.

[6]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[7]  Matthew Chalmers,et al.  A virtual workspace for hybrid multidimensional scaling algorithms , 2003, IEEE Symposium on Information Visualization 2003 (IEEE Cat. No.03TH8714).

[8]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[9]  Edward M. Reingold,et al.  Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[10]  Joemon M. Jose,et al.  Audio-Based Event Detection for Sports Video , 2003, CIVR.

[11]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. I. , 1962 .

[12]  Matthew Chalmers,et al.  A linear iteration time layout algorithm for visualising high-dimensional data , 1996, Proceedings of Seventh Annual IEEE Visualization '96.

[13]  David Harel,et al.  ACE: a fast multiscale eigenvectors computation for drawing huge graphs , 2002, IEEE Symposium on Information Visualization, 2002. INFOVIS 2002..

[14]  Enrique Vidal-Ruiz,et al.  An algorithm for finding nearest neighbours in (approximately) constant average time , 1986, Pattern Recognit. Lett..

[15]  E. Ruiz An algorithm for finding nearest neighbours in (approximately) constant average time , 1986 .

[16]  Matthew Chalmers,et al.  Fast Multidimensional Scaling Through Sampling, Springs and Interpolation , 2003, Inf. Vis..

[17]  Peter Eades,et al.  A Heuristic for Graph Drawing , 1984 .

[18]  Wolfgang Kienreich,et al.  The InfoSky visual explorer: Exploiting Hierarchical Structure and Document Similarities , 2002, Inf. Vis..

[19]  Walter A. Burkhard,et al.  Some approaches to best-match file searching , 1973, Commun. ACM.

[20]  Franz Aurenhammer,et al.  Voronoi diagrams—a survey of a fundamental geometric data structure , 1991, CSUR.

[21]  J. Klingner Visualizing Sets of Evolutionary Trees , 2001 .

[22]  Matthew Chalmers,et al.  Bead: explorations in information visualization , 1992, SIGIR '92.

[23]  Matthew Chalmers,et al.  A Visual Workspace for Constructing Hybrid Multidimensional Scaling Algorithms and Coordinating Multiple Views , 2003, Inf. Vis..

[24]  Sergey Brin,et al.  Near Neighbor Search in Large Metric Spaces , 1995, VLDB.

[25]  P. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 1999 .

[26]  Matthew Chalmers,et al.  Combining and comparing clustering and layout algorithms , 2001 .

[27]  Gonzalo Navarro,et al.  Fixed Queries Array: A Fast and Economical Data Structure for Proximity Searching , 2001, Multimedia Tools and Applications.

[28]  Matthew Chalmers,et al.  A hybrid layout algorithm for sub-quadratic multidimensional scaling , 2002, IEEE Symposium on Information Visualization, 2002. INFOVIS 2002..

[29]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[30]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[31]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[32]  Myron Wish,et al.  Three-Way Multidimensional Scaling , 1978 .

[33]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[34]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. II , 1962 .

[35]  Nina Amenta,et al.  Case study: visualizing sets of evolutionary trees , 2002, IEEE Symposium on Information Visualization, 2002. INFOVIS 2002..

[36]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[37]  Kerry Rodden,et al.  Does organisation by similarity assist image browsing? , 2001, CHI.

[38]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[39]  J. Bourgain On lipschitz embedding of finite metric spaces in Hilbert space , 1985 .