Visualization and data mining of high-dimensional data

Abstract Visualization provides insight through images and can be considered as a collection of application specific mappings: ProblemDomain→VisualRange. For the visualization of multivariate problems a multidimensional system of parallel coordinates (abbreviated as ∥-coords) is constructed which induces a one-to-one mapping between subsets of N -space and subsets of 2-space. The result is a rigorous methodology for doing and seeing N -dimensional geometry. Starting with an the overview of the mathematical foundations, it is seen that the display of high-dimensional datasets and search for multivariate relations among the variables is transformed into a 2-D pattern recognition problem. This is the basis for the application to Visual Data Mining which is illustrated with real dataset of Very Large Scale Integration (VLSI—“chip”) production. Then a recent geometric classifier is presented and applied to three real datasets. The results compared to those of 23 other classifiers have the least error. The algorithm has quadratic computational complexity in the size and number of parameters, provides comprehensible and explicit rules, does dimensionality selection—where the minimal set of original variables required to state the rule is found—and orders these variables so as to optimize the clarity of separation between the designated set and its complement. Finally, a simple visual economic model of a real country is constructed and analyzed in order to illustrate the special strength of ∥-coords in modeling multivariate relations by means of hypersurfaces.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  H. Coxeter,et al.  The Real Projective Plane , 1992 .

[3]  Alfred Inselberg,et al.  Convexity algorithms in parallel coordinates , 1987, JACM.

[4]  Alfred Inselberg,et al.  Multidimensional Lines II: Proximity and Applications , 1994, SIAM J. Appl. Math..

[5]  Alfred Inselberg,et al.  Don't panic ... just do it in parallel! , 1999, Comput. Stat..

[6]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[7]  Hans Hinterberger,et al.  Comparative multivariate visualization across conceptually different graphic displays , 1994, Seventh International Working Conference on Scientific and Statistical Database Management.

[8]  Edward R. Tufte,et al.  The Visual Display of Quantitative Information , 1986 .

[9]  A. Inselberg,et al.  Visualizing multi-dimensional polytopes and topologies for tolerances , 1995 .

[10]  Alfred Inselberg,et al.  The automated multidimensional detective , 1999, Proceedings 1999 IEEE Symposium on Information Visualization (InfoVis'99).

[11]  Matthew O. Ward,et al.  High Dimensional Brushing for Interactive Exploration of Multivariate Data , 1995, Proceedings Visualization '95.

[12]  Tova Avidan,et al.  ParallAX– A data mining tool based on parallel coordinates , 1999, Comput. Stat..

[13]  Hans-Peter Kriegel,et al.  Visualization Techniques for Mining Large Databases: A Comparison , 1996, IEEE Trans. Knowl. Data Eng..

[14]  John Scott Eickemeyer Visualizing P-flats in N-space using parallel coordinates , 1992 .