Interactive Visualization of Multivariate Statistical Data

Abstract —This paper introduces web-based interactive Linked Micromap (LM) plots, a set of dynamic visualization methods that allows readers to interactively select variables and modify the different views to help reveal relationships among the study units. This methodology provided the foundation for web-based micromaps used by the National Cancer Institute (NCI). This illustrates the power of visualization to make statistical summaries involving health and risk factors for millions of people accessible to health planners than may have never had a statistics class. LM plots methodology is in use by the Department of Agriculture and readily extend to other application in other agencies in the United States. The interactive methods can be as useful in such extensions as they were for the National Cancer Institute. Index Terms — Interactive visualization, Linked Micromap (LM) plots, statistical data display, web-based visualization. I. INTRODUCTIONLinked Micromap (LM) plots constitute a new template for the display of spatially indexed statistical summaries [1, 2]. This template has four key features: 1) displaying at least three parallel sequences of panel types that include micromaps, labels, and statistical summaries, 2) sorting of study units, 3) partitioning of the study units into perceptual grouping panels to focus attention on a few units at a time, and 4) positional linking of perceptual grouping panels across panel types and linking of study units across panel types typically using color and often using position. Static LM plots have been used to visualize data sets varying size, complex, and domains [3, 4, 5]. The first effort toward a web application involved hundreds of hazardous air pollutants and estimates available for the US states, counties, and even census tracts [6, 7, 8]. This Environmental Protection Agency (EPA) research was stopped before releasing the web site to the public due to concerns about data quality and public reaction. The LM methodology introduced below is new except it utilized general map boundaries for the EPA funded research. public information, the web based implementation of LM plots We use cancer statistics from the National Cancer Institute (NCI) as an application and im plementation example to present the methods. The displays are of test data, not of official cancer statistics, but nonetheless provide an excellent test-bed for studying statistical visualization methodology. We have implemented a full-fledged set of LM plots for recent cancer statistical summaries of the United States at the state and county level. The test-bed list of selectable cancer sites is restricted to breast, colon, prostate, and lung, but readily extended. These web-based interactive LM plots have preserved all the key features of the LM plots originally published. While spatial resolution is lost in the transition from the printed page to a computer monitor, the interactive viewing options, allowed better visuali zation through drill-down views, multiple levels of detail, sorting, magnified micromaps, miniature overall statistical summary, confidence interval switching, and other interactive visualization methods. While the interactive methods are not new individually, their integration with LM plots provides a new approach to communicating spatially indexed statistical summaries over the Internet. Since the Internet is a widely accessible source of will make information available to more readers while the new interactivity can lead to more involvement and better understanding. This paper introduces