Similarity clustering of dimensions for an enhanced visualization of multidimensional data

The order and arrangement of dimensions (variates) is crucial for the effectiveness of a large number of visualization techniques such as parallel coordinates, scatterplots, recursive pattern, and many others. We describe a systematic approach to arrange the dimensions according to their similarity. The basic idea is to rearrange the data dimensions such that dimensions showing a similar behavior are positioned next to each other. For the similarity clustering of dimensions, we need to define similarity measures which determine the partial or global similarity of dimensions. We then consider the problem of finding an optimal one- or two-dimensional arrangement of the dimensions based on their similarity. Theoretical considerations show that both, the one- and the two-dimensional arrangement problem are surprisingly hard problems, i.e. they are NP complete. Our solution of the problem is therefore based on heuristic algorithms. An empirical evaluation using a number of different visualization techniques shows the high impact of our similarity clustering of dimensions on the visualization results.

[1]  Harpreet Sawhney,et al.  Efficient color histogram indexing , 1994, Proceedings of 1st International Conference on Image Processing.

[2]  Hans-Peter Kriegel,et al.  VisDB: database exploration using multidimensional visualization , 1994, IEEE Computer Graphics and Applications.

[3]  Hans-Peter Kriegel,et al.  Recursive pattern: a technique for visualizing very large amounts of data , 1995, Proceedings Visualization '95.

[4]  D. Kahnert Haar-Mass und Hausdorff-Mass , 1976 .

[5]  Alfred Inselberg,et al.  Parallel coordinates for visualizing multi-dimensional geometry , 1987 .

[6]  Daniel A. Keim,et al.  Visual support for query specification and data mining , 1995 .

[7]  David W. Scott The New S Language , 1990 .

[8]  Alfred Inselberg,et al.  Parallel coordinates: a tool for visualizing multi-dimensional geometry , 1990, Proceedings of the First IEEE Conference on Visualization: Visualization `90.

[9]  Christopher Ahlberg,et al.  IVEE: an environment for automatic creation of dynamic queries applications , 1995, CHI '95.

[10]  Daniel P. Huttenlocher,et al.  Computing the minimum Hausdorff distance for point sets under translation , 1990, SCG '90.

[11]  Hans-Peter Kriegel,et al.  VisDB: a system for visualizing large databases , 1995, SIGMOD '95.

[12]  Georges G. Grinstein,et al.  Exvis: an exploratory visualization environment , 1989 .

[13]  Hans-Peter Kriegel,et al.  The pyramid-technique: towards breaking the curse of dimensionality , 1998, SIGMOD '98.

[14]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[15]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[16]  Rajiv Mehrotra,et al.  Feature-Index-Based Similar Shape Retrieval , 1997, VDB.

[17]  Matthew O. Ward,et al.  XmdvTool: integrating multiple methods for visualizing multivariate data , 1994, Proceedings Visualization '94.

[18]  Kyuseok Shim,et al.  Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases , 1995, VLDB.

[19]  Hans-Peter Kriegel,et al.  'Circle Segments': A Technique for Visually Exploring Large Multidimensional Data Sets , 1996 .

[20]  Christian Böhm,et al.  Independent quantization: an index compression technique for high-dimensional data spaces , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[21]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[22]  Luca Maria Gambardella,et al.  Ant colony system: a cooperative learning approach to the traveling salesman problem , 1997, IEEE Trans. Evol. Comput..

[23]  Daniel A. Keim,et al.  Visual Techniques for Exploring Databases , 1997, KDD 1997.

[24]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[25]  MineSet(tm): A System for High-End Data Mining and Visualization , 1996, VLDB.

[26]  P. Wintz,et al.  An efficient three-dimensional aircraft recognition algorithm using normalized fourier descriptors , 1980 .