Visualization of Diversity in Large Multivariate Data Sets

Understanding the diversity of a set of multivariate objects is an important problem in many domains, including ecology, college admissions, investing, machine learning, and others. However, to date, very little work has been done to help users achieve this kind of understanding. Visual representation is especially appealing for this task because it offers the potential to allow users to efficiently observe the objects of interest in a direct and holistic way. Thus, in this paper, we attempt to formalize the problem of visualizing the diversity of a large (more than 1000 objects), multivariate (more than 5 attributes) data set as one worth deeper investigation by the information visualization community. In doing so, we contribute a precise definition of diversity, a set of requirements for diversity visualizations based on this definition, and a formal user study design intended to evaluate the capacity of a visual representation for communicating diversity information. Our primary contribution, however, is a visual representation, called the Diversity Map, for visualizing diversity. An evaluation of the Diversity Map using our study design shows that users can judge elements of diversity consistently and as or more accurately than when using the only other representation specifically designed to visualize diversity.

[1]  Helwig Hauser,et al.  Parallel Sets: interactive exploration and visual analysis of categorical data , 2006, IEEE Transactions on Visualization and Computer Graphics.

[2]  Matthew O. Ward,et al.  Exploring N-dimensional databases , 1990, Proceedings of the First IEEE Conference on Visualization: Visualization `90.

[3]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[4]  R. H. Whittaker,et al.  Dominance and Diversity in Land Plant Communities , 1965, Science.

[5]  Ben Shneiderman,et al.  A Rank-by-Feature Framework for Interactive Exploration of Multidimensional Data , 2005, Inf. Vis..

[6]  A. Magurran,et al.  Measuring Biological Diversity , 2004 .

[7]  Daniel A. Keim,et al.  Information Visualization and Visual Data Mining , 2002, IEEE Trans. Vis. Comput. Graph..

[8]  A. Treisman Preattentive processing in vision , 1985, Comput. Vis. Graph. Image Process..

[9]  Matthew O. Ward,et al.  Hierarchical parallel coordinates for exploration of large datasets , 1999, Proceedings Visualization '99 (Cat. No.99CB37067).

[10]  Haim Levkowitz,et al.  Uncovering Clusters in Crowded Parallel Coordinates Visualizations , 2004, IEEE Symposium on Information Visualization.

[11]  Ben Shneiderman,et al.  Tree visualization with tree-maps: 2-d space-filling approach , 1992, TOGS.

[12]  Penny Rheingans,et al.  Visualizing Diversity and Depth over a Set of Objects , 2007, IEEE Computer Graphics and Applications.

[13]  Ben Shneiderman,et al.  Tree-maps: a space-filling approach to the visualization of hierarchical information structures , 1991, Proceeding Visualization '91.

[14]  D. Harrison,et al.  What's the difference? Diversity constructs as separation, variety, or disparity in organizations. , 2007 .

[15]  D. L. Macadam Visual Sensitivities to Color Differences in Daylight , 1942 .

[16]  Eser Kandogan,et al.  Visualizing multi-dimensional clusters, trends, and outliers using star coordinates , 2001, KDD '01.

[17]  S. Hurlbert The Nonconcept of Species Diversity: A Critique and Alternative Parameters. , 1971, Ecology.

[18]  Jan Khre,et al.  The Mathematical Theory of Information , 2012 .

[19]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[20]  Colin Ware,et al.  Information Visualization: Perception for Design , 2000 .

[21]  Alfred Inselberg,et al.  Parallel coordinates: a tool for visualizing multi-dimensional geometry , 1990, Proceedings of the First IEEE Conference on Visualization: Visualization `90.

[22]  Herman Chernoff,et al.  The Use of Faces to Represent Points in k- Dimensional Space Graphically , 1973 .

[23]  Helwig Hauser,et al.  Angular brushing of extended parallel coordinates , 2002, IEEE Symposium on Information Visualization, 2002. INFOVIS 2002..

[24]  W. Cleveland,et al.  Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods , 1984 .

[25]  Alfred Inselberg,et al.  Multidimensional detective , 1997, Proceedings of VIZ '97: Visualization Conference, Information Visualization Symposium and Parallel Rendering Symposium.

[26]  R. Iman,et al.  Rank Transformations as a Bridge between Parametric and Nonparametric Statistics , 1981 .

[27]  John T. Stasko,et al.  An evaluation of space-filling information visualizations for depicting hierarchical structures , 2000, Int. J. Hum. Comput. Stud..

[28]  Dimitris K Agrafiotis,et al.  A method for quantifying and visualizing the diversity of QSAR models. , 2004, Journal of molecular graphics & modelling.

[29]  J. K. Murnighan,et al.  Demographic Diversity and Faultlines: The Compositional DYnamics of Organizational Groups , 1998 .

[30]  Forrest W. Young Multidimensional Scaling: History, Theory, and Applications , 1987 .

[31]  Daniel A. Keim,et al.  Visual database exploration techniques , 1997 .

[32]  Ben Shneiderman,et al.  Interface and data architecture for query preview in networked information systems , 1999, TOIS.

[33]  T. W. Anderson,et al.  An Introduction to Multivariate Statistical Analysis , 1959 .

[34]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[35]  Matthew O. Ward,et al.  Clutter Reduction in Multi-Dimensional Data Visualization Using Dimension Reordering , 2004, IEEE Symposium on Information Visualization.

[36]  Cristian S. Calude The mathematical theory of information , 2007 .

[37]  Anne Treisman,et al.  Preattentive processing in vision , 1985, Computer Vision Graphics and Image Processing.

[38]  Matthew O. Ward,et al.  Clutter Reduction in Multi-Dimensional Data Visualization Using Dimension Reordering , 2004 .

[39]  Jock D. Mackinlay,et al.  Automating the design of graphical presentations of relational information , 1986, TOGS.