The Generalized Pairs Plot

This article develops a generalization of the scatterplot matrix based on the recognition that most datasets include both categorical and quantitative information. Traditional grids of scatterplots often obscure important features of the data when one or more variables are categorical but coded as numerical. The generalized pairs plot offers a range of displays of paired combinations of categorical and quantitative variables. A mosaic plot, fluctuation diagram, or faceted bar chart may be used to display two categorical variables. A side-by-side boxplot, stripplot, faceted histogram, or density plot helps visualize a categorical and a quantitative variable. A traditional scatterplot is suitable for displaying a pair of numerical variables, but options also support density contours or annotating summary statistics such as the correlation and number of missing values, for example. By combining these, the generalized pairs plot may help to reveal structure in multivariate data that otherwise might go unnoticed in the process of exploratory data analysis. Two different R packages provide implementations of the generalized pairs plot, gpairs and GGally. Supplementary materials for this article are available online on the journal web site.

[1]  Beat Kleiner,et al.  Graphical Methods for Data Analysis , 1983 .

[2]  Pedro M. Valero-Mora,et al.  ggplot2: Elegant Graphics for Data Analysis , 2010 .

[3]  Deepayan Sarkar,et al.  Lattice: Multivariate Data Visualization with R , 2008 .

[4]  Charlotte V Kuh,et al.  Definitions of Relevant Variables from the Data-Based Assessment of Research-Doctorate Programs , 2011 .

[5]  Herman Chernoff,et al.  The Use of Faces to Represent Points in k- Dimensional Space Graphically , 1973 .

[6]  Beat Kleiner,et al.  A Mosaic of Television Ratings , 1984 .

[7]  M. Friendly Mosaic Displays for Multi-Way Contingency Tables , 1994 .

[8]  Edward R. Tufte,et al.  The Visual Display of Quantitative Information , 1986 .

[9]  Trevor Hastie,et al.  Statistical Models in S , 1991 .

[10]  Philippe Grosjean,et al.  A functional growth model with intraspecific competition applied to a sea urchin, Paracentrotus lividus , 2003 .

[11]  W. Cleveland,et al.  The elements of graphing data , 1985 .

[12]  M. Friendly Corrgrams , 2002 .

[13]  Tanja Srebotnjak Environmental Performance Index , 2014 .

[14]  John Alan McDonald,et al.  Interactive graphics for data analysis , 1982 .

[15]  Alfred Inselberg,et al.  The plane with parallel coordinates , 1985, The Visual Computer.

[16]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[17]  Jeremiah P. Ostriker,et al.  A Data-Based Assessment of Research-Doctorate Programs in the United States , 2011 .

[18]  Martin Theus,et al.  Interactive Data Visualization using Mondrian , 2002 .

[19]  Simon Urbanek,et al.  Interactive graphics for Data Analysis - Principles and Examples , 2008, Computer science and data analysis series.

[20]  Andrew Moffat Environmental Performance Index , 2014 .

[21]  Paul Murrell,et al.  R Graphics , 2006, Computer science and data analysis series.

[22]  I. Ntzoufras Gibbs Variable Selection using BUGS , 2002 .

[23]  John M. Chambers,et al.  Graphical Methods for Data Analysis , 1983 .

[24]  Antony Unwin,et al.  Requirements for interactive graphics software for exploratory data analysis , 1999, Comput. Stat..

[25]  Duncan Temple Lang,et al.  GGobi: evolving from XGobi into an extensible framework for interactive data visualization , 2003, Comput. Stat. Data Anal..

[26]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[27]  Charles E. Heckler,et al.  Graphical Analysis of Multiresponse Data, Illustrated With a Plant Breeding Trial, Interdisciplinary Statistics , 2001, Technometrics.

[28]  J. Hartigan Printer graphics for clustering , 1975 .

[29]  Peter G. Bryant,et al.  Practical Data Analysis: Case Studies in Business Statistics , 1994 .

[30]  Richard A. Becker,et al.  The Visual Design and Control of Trellis Display , 1996 .

[31]  Peter L. Brooks,et al.  Visualizing data , 1997 .

[32]  Mark Bailey,et al.  The Grammar of Graphics , 2007, Technometrics.

[33]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[34]  Richard A. Becker,et al.  Brushing scatterplots , 1987 .

[35]  E. Wegman Hyperdimensional Data Analysis Using Parallel Coordinates , 1990 .

[36]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .