Visualizing Categorical Data Arising in the Health Sciences Using Hammock Plots

I introduce the hammock plot, a new plot that can be used to visualize categorical data and mixed categorical / continuous data. It is well suited for all types of categorical data: both unordered and ordered categorical data as well as interval data. It can be viewed as a generalization of parallel coordinate plots where the lines are replaced by rectangles that are proportional to the number of observations they represent. In addition to the rectangles, hammock plots also incorporate univariate descriptors such as category labels into the graph. I illustrate the hammock plot with examples from the health sciences. Introduction There are four commonly used plots for visualizing high dimensional data: scatter plot matrices (Hartigan 1975), Mosaic plots (Hartigan and Kleiner, 1991, Friendly, 1994, Theus, 1996), parallel coordinate plots (Inselberg 1984, Wegman 1990) and Trellis displays (Becker et al. 1996, Theus, 1999). Scatter plot matrices and parallel coordinate plots are best suited for continuous data. Over plotting is sometimes a problem for these plots. Over plotting occurs when more than one observation is assigned to the same physical space on the plot. A common way to deal with (unordered) categorical data is to assign each category a numerical value. If these values are then plotted an extreme amount of over plotting occurs. Consequently, neither scatter plot matrices nor parallel coordinate plots do well with categorical data either. Jittering, i.e. adding spherical noise to an observation, can be used to alleviate the over plotting problem somewhat. Mosaic plots were conceived to display categorical data. Mosaic plots are not suited for continuous data at all. There are several types of categorical data: unordered categorical data, ordered categorical data, and interval data. Interval data are ordered data for which the separation between data points has meaning (Agresti, p.3). The number of comorbidities of a patient, or the number of children of a parent are examples of interval data. Mosaic plots are well suited for unordered and ordered categorical data. Mosaic plots treat interval data like ordered categorical data the distance between categories is not visually apparent. Neither Mosaic plots, scatter plot matrices nor parallel coordinate plots are well suited for data that have both categorical and continuous variables. In Trellis displays one specific plot (e.g. scatter plot or a box plot) is displayed for different subsets of conditioning variables. These plots are then arranged as a panel. For example, one might display two continuous and one categorical variable as a panel of scatter plots – one for