Multi-Dimensional Data Visualization

Historically, data visualization has been limited primarily to 2 dimensions (e.g., histograms, scatter plots). Available software packages (e.g., Data Desk 6.1, MatLab 6.1, SAS-JMP 4.04, SPSS 10.0) are capable of producing 3-D scatter plots with (varying degrees of) user interactivity. We constructed our own data visualization application with The Visualization Toolkit (Schroeder, Martin, & Lorensen, 1998) and Tcl/Tk to display multivariate data through the application of glyphs (Ware, 2000). A glyph is a visual object onto which many data parameters may be mapped, each with a different visual attribute (e.g., size, color). We used our Multi-Dimensional Data Viewer to explore data from several psycholinguistic experiments. The graphical interface provides flexibility when users dynamically explore the multi-dimensional image rendered from raw experimental data. We highlight advantages of multidimensional data visualization and consider some potential limitations. Multi-Dimensional Data Visualization Data visualization has become an increasingly popular method to display and explore complex (multivariate) scientific data (see Schroeder, Martin, & Lorensen, 1998 for an overview). Simply stated, raw experimental, theoretical, or demographic data are transformed into an image or a series of images. The exploration of the resultant data image(s) is the essence of data visualization. A variety of techniques exist to extract patterns from data (Marchak, 1994). Each technique has the potential to elucidate aspects of the data that are typically obscured or simply not captured by measures of central tendency or dispersion. One method, ‘data spinning,’ may be particularly well suited for the exploratory analysis of multivariate data (Marchak, 1994). ‘Data spinning’ consists of the rotation of data points in 3dimensional (3-D) space. Rotation can be interactive (user-controlled) or passive (animated). Several computer software applications exist that allow users to display and rotate data in 3-D space. Some programs, however, have limited user-interactivity (e.g., SPSS 10.0), while others are costly (e.g., Data Desk 6.1, MatLab 6.1, SAS-JMP 4.04), or difficult to obtain (e.g., MacSpin). Consequently, we constructed our own data visualization application with The Visualization Toolkit (Schroeder et al., 1998) and Tcl/Tk to facilitate the rapid display of multivariate data. We used our Multi-Dimensional Data Viewer (MDDV) to explore data from several psycholinguistic experiments (Feldman & Pastizzo, 2001; Pastizzo & Feldman, 2002). Graphical representations of data in a spatial array can facilitate the comprehension and analysis of many types of data. Perhaps the greatest benefit of data visualization is the ability to explore aspects of data that are not revealed by standard statistical measures (for a related argument, see Loftus, 1993). The inclusion of exploratory data analysis (EDA) and graphical data analysis (GDA) in statistics handbooks lends further support to the notion that researchers in the behavioral sciences are coming to appreciate and to use graphical methods of data analysis (Smith & Prentice, 1993 and Wainer & Thissen, 1993, respectively). The core principle of EDA is, not surprisingly, to explore the data; to this end, Smith and Prentice (1993) advocated the use of graphical depictions (e.g., stem-and-leaf plots, box plots, scatter plots). Historically, data visualization has been limited primarily to 2 dimensions (e.g., histograms, scatter plots). Advances in computer technology, however, have promoted more sophisticated graphical displays. In the framework of scientific visualization, Castellan (1991) proposed, “[Powerful graphics] should enable scientists to better understand complex phenomena – particularly dynamic systems” (p. 108). That is, developments in computer hardware and software have led to the appearance of enhanced graphics that have the potential to help scientists visualize physical, and more recently, psychological phenomena of a complex, interactive nature. For example, researchers in the physical sciences have utilized 3-D data visualization techniques to explore the interaction of variables (e.g., velocity, friction, gravity) that simultaneously impact physical phenomena (e.g., motion). Analogously, as we discover variables that influence cognitive behavior, it is possible to map these variables onto dependent measures of behavior (e.g., response latencies and accuracy rates). Within the domain of psycholinguistics, many studies have established that word properties (e.g., word frequency, word length, word family size) determine the latency to identify a printed word presented in isolation; therefore, to explain variation in response latencies we need an account of how variables interact. Although we often restrict ourselves to statistical measures to capture patterns of multiple variables, interactions are, by nature, complex and difficult to interpret. Therefore, the simultaneous display of these (multiple) variables with 3-D graphics has the potential to supplement and augment conventional accounts. Statistical software packages (e.g., Data Desk 6.1, MatLab 6.1, SAS-JMP 4.04, SPSS 10.0) that have utilities to plot three (categorical or continuous) variables in a 3-D scatter plot are available. In addition to the three primary variables, users also can specify a 4th (categorical) variable to designate group membership (differentiated by color, shape, and/or size). In general, software packages such as these provide many useful tools for data exploration including (but not limited to) display rotation, zooming, and point identification. Typically, variables are displayed in a 3-D space that the user can rotate with mouse movements, and can enlarge/reduce with button presses. The capability to display simultaneously 4 (or more) continuous variables or to dynamically select variable ranges is less common, however. Cost and/or design limitations inspired us to create our own graphical tool. Its enhanced graphical user interface permits dynamic rotation with unlimited variables and a dynamic selection of range. Design Parameters for the Multi-Dimensional Data Viewer There are at least two critical aspects of data visualization: (1) the resultant image, and (2) the user interface. Each of these elements will be discussed in turn. First, the most straightforward way to generate an image of unstructured data points is with a simple scatter plot. Traditional scatter plots capture the data in a 2-D or 3-D space. Because each variable requires its own dimension, these plots can only display 2 or 3 variables. The advent of new hardware and software has made it possible to depict visually a larger number of variables with greater speed and ease. The Visualization Toolkit (VTK) provides one powerful option. VTK is an object-oriented language that uses an information pipeline to transform raw data into glyphs that are plotted in a 3-D space. In essence, a glyph is a visual object onto which many data parameters may be mapped, each with a different visual attribute. Generally, additional dimensions (variables), beyond the standard 3 orientation axes, can be mapped onto glyphs through: (1) scalar mapping, and/or (2) color/texture mapping. The present paper will demonstrate the use of scalar and color mapping to reflect a fourth or fifth dimension of a data set. VTK includes simple commands to create such an image. The resultant image derives from the mapping of visual objects into 3-D space with specific data. Additional data parameters or redundant mappings help to control the “appearance” of each visual object, and therefore provide greater visual reinforcement. The projection of a three dimensional scene onto a two dimensional plane generates the resultant image in 2-D space. The user’s position and orientation within the 3-D space determines the projection of the scene. As a result, images can be generated from any point of view. Figure 1. Screen shot of transparency application to capture a sphere within a sphere. Three variables are mapped to the three primary axes. A fourth dimension is mapped to inner sphere size, and a fifth dimension is mapped to outer sphere size. Sphere size reveals the relation between the fourth and fifth dimensions, and has higher potential resolution than other techniques such as grayscale value. As quickly becomes evident, the environment is appropriate for data exploration. Because relationships between dimensions are not necessarily pre-established, one benefit is that the visualization environment allows one to explore the data in order to find relationships and important results that are not immediately obvious. To this end, the environment allows selective display of data set points. For example, through the application of a transparency filter, data parameters can be displayed selectively; as a result, overlapping data points become visually distinct through mappings with different transparency values. Additionally, mapping data values to sphere size creates spheres inside of spheres, which can reveal overlapping points or can depict the magnitude of a data point on a fourth and fifth dimension (see Figure 1). In essence, transparency allows inner spheres to be seen without the actual removal of outer spheres. Specifically, a high transparency setting corresponds to clear objects, while low transparency corresponds to opaque objects. Equally as important as the data image is the user interface. The user interface defines the primary tool for exploration in a data visualization and therefore allows users to explore the rich image of glyphs that has been generated so as to capture multiple dimensions. Rotation represents the simplest form of user interaction with a 3-D image. For example, users can achieve the desired view orientation with SPSS 10.0 by separately adjusting yaw (left-right), pitch (up-down), and roll (side-side). VTK, however, includes a more

[1]  N. John Castellan Computers and computing in psychology: Twenty years of progress and still a bright future , 1991 .

[2]  Frank M. Marchak An overview of scientific visualization techniques applied to experimental psychology , 1994 .

[3]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[4]  Laurie Beth Feldman,et al.  Discrepancies between orthographic and unrelated baselines in masked priming undermine a decompositional account of morphological facilitation. , 2002, Journal of experimental psychology. Learning, memory, and cognition.

[5]  Frank M. Marchak,et al.  Dynamic graphics in the exploratory analysis of multivariate data , 1990 .

[6]  Colin Ware,et al.  Information Visualization: Perception for Design , 2000 .

[7]  Howard Wainer,et al.  A Handbook for Data Analysis in the Behavioral Sciences: Statistical Issues , 1993 .

[8]  Howard Wainer,et al.  GRAPHICAL DATA ANALYSIS , 1981 .

[9]  Frank M. Marchak,et al.  The effectiveness of dynamic graphics in revealing structure in multivariate data , 1992 .

[10]  Geoffrey R. Loftus,et al.  A picture is worth a thousandp values: On the irrelevance of hypothesis testing in the microcomputer age , 1993 .

[11]  Frank M. Marchak,et al.  Interactive versus passive dynamics and the exploratory analysis of multivariate data , 1991 .

[12]  Gideon Keren,et al.  A Handbook for Data Analysis in the Behavioral Sciences: Statistical Issues , 1993 .