ggfortify: Unified Interface to Visualize Statistical Results of Popular R Packages

The ggfortify package provides a unified interface that enables users to use one line of code to visualize statistical results of many R packages using ggplot2 idioms. With the help of ggfortify, statisticians, data scientists, and researchers can avoid the sometimes repetitive work of using the ggplot2 syntax to achieve what they need. Background R users have many plotting options to choose from, such as base graphics, grid graphics, and lattice graphics (Sarkar, 2008). Each has their own unique customization and extensibility options. In recent years, ggplot2 has emerged as a popular choice for creating visualizations (Wickham, 2009) and provides a strong programming model based on a “grammar of graphics” which enables methodical production of virtually any kind of statistical chart. The ggplot2 package makes it possible to describe a wide range of graphics with succinct syntax and independent components and is based on an objectoriented model that also makes it modular and extensible. It has become a widely used framework for producing statistical graphics in R. The distinct syntax of ggplot2 makes it a definite paradigm shift from base and lattice graphics and presents a somewhat steep learning curve for those used to existing R charting idioms. Often times users only want to quickly visualize some statistical results from key R packages, especially those focusing on clustering and time series analysis. Many of these packages provide default base plot() visualizations for the data and models they generate. These components require transformation before using them in ggplot2 and each of those transformation steps must be replicated by others when they wish to produce similar charts in their analyses. Creating a central repository for common/popular transformations and default plotting idioms would reduce the amount of effort needed by all to create compelling, consistent and informative charts. To achieve this, we provide a unified ggplot2 plotting interface to many statistics and machine-learning packages and functions in order to help these users achieve reproducibility goals with minimal effort. The ggfortify (Horikoshi and Tang, 2015) package has a very easy-to-use and uniform programming interface that enables users to use one line of code to visualize statistical results of many popular R packages using ggplot2 as a foundation. This helps statisticians, data scientists, and researchers avoid both repetitive work and the need to identify the correct ggplot2 syntax to achieve what they need. With ggfortify, users are able to generate beautiful visualizations of their statistical results produced by popular packages with minimal effort. Software architecture There are many ways to extend the functionality of ggplot2. One straightforward way is through the use of S3 generic functions 1. Specifically, it is possible to provide custom functions for: • autoplot(), which enables plotting a custom object with ggplot2, and • fortify(), which enables converting a custom object to a tidy "data.frame" The ggforitfy package uses this extensibility to provide default ggplot2 visualizations and data transformations. To illustrate this, we consider the implementation for fortify.prcomp() and autoplot.pca_common() used as a basis of other PCA related implementations: fortify.prcomp <function(model, data = NULL, ...) { if (is(model, "prcomp")) { d <as.data.frame(model$x) 1http://adv-r.had.co.nz/S3.html The R Journal Vol. 8/2, December 2016 ISSN 2073-4859 CONTRIBUTED RESEARCH ARTICLES 475 values <model$x %*% t(model$rotation) } else if (is(model, "princomp")) { d <as.data.frame(model$scores) values <model$scores %*% t(model$loadings[,]) } else { stop(paste0("Unsupported class for fortify.pca_common: ", class(model))) } values <ggfortify::unscale(values, center = model$center, scale = model$scale) values <cbind_wraps(data, values) d <cbind_wraps(values, d) post_fortify(d) } This S3 function recognizes "prcomp" objects and will extract the necessary components from them such as the matrix whose columns contain the eigenvectors in "rotation" and rotated data in "x", which can be drawn using autoplot() later on. The if() call is used here to handle different objects that are of essentially the same principal components family since they can be handled in the exactly same way once the necessary components are extracted from ggfortify. The following autoplot.pca_common() function first calls fortify() to perform the component extraction for different PCA-related objects, then performs some common data preparation for those objects, and finally calls ggbiplot() internally to handle the actual plotting. autoplot.pca_common <function(object, data = NULL, scale = 1.0, ...) { plot.data <ggplot2::fortify(object, data = data) plot.data$rownames <rownames(plot.data) if (is_derived_from(object, "prcomp")) { x.column <"PC1" y.column <"PC2" loadings.column <"rotation" lam <object$sdev[1L:2L] lam <lam * sqrt(nrow(plot.data)) } else if (is_derived_from(object, "princomp")) { ... } else { stop(paste0("Unsupported class for autoplot.pca_common: ", class(object))) } # common and additional preparation before plotting ... p <ggbiplot(plot.data = plot.data, loadings.data = loadings.data, ...) return(p) } Once ggfortify is loaded, users have instant access to 38 pre-defined autoplot() functions and 36 pre-defined fortify() functions, enabling them to immediately autoplot() numerous types of objects or pass those objects directly to ggplot2 for manual customization. Furthermore, ggfortify is highly extensible and customizable and provides utility functions that make it easy for users to define autoplot() and fortify() methods for their own custom objects. To present a streamlined API, ggfortify groups common implementations for various object-types, including: • Time-series • Principal components analysis (PCA), including clustering and multi-dimensional sacling (MDS) The R Journal Vol. 8/2, December 2016 ISSN 2073-4859 CONTRIBUTED RESEARCH ARTICLES 476 Table 1: Supported packages package supported types package supported types base "matrix", "table" sp "SpatialPoints", "SpatialPolygons", "Line", "Lines", "Polygon", "Polygons", "SpatialLines", "SpatialLinesDataFrame", "SpatialPointsDataFrame", "SpatialPolygonsDataFrame" cluster "clara", "fanny", "pam" stats "HoltWinters", "lm", "acf", "ar", "Arima", "stepfun", "stl", "ts", "cmdscale", "decomposed.ts", "density", "factanal", "glm", "kmeans", "princomp", "spec" changepoint "cpt" survival "survfit", "survvfit.cox" dlm "dlmFilter", "dlmSmooth" strucchange "breakpoints", "breakpointsfull" fGarch "fGARCH" timeSeries "timeSeries" forecast "bats", "forecast", "ets", "nnetar" tseries "irts" fracdiff "fracdiff" vars "varprd" glmnet "cv.glmnet", "glmnet" xts "xts" KFAS "KFS", "signal" zoo "zooreg" lfda "lfda", "klfda", "self" MASS "isoMDS", "sammon" maps "map" • 1d/2d kernel density estimation (KDE) • Survival analysis • Cartography A list of currently supported packages and classes can be found in Table 1. Additional packages that are in development are not shown here but more than 50 object types are supported by ggfortify. Feedback is being collected from users2 for possible bug fixes and future enhancements. Illustrations As previously stated, ggfortify provides methods that enable ggplot2 to work with objects in different classes from different R packages. The following subsections illustrate how to use ggfortify to plot results from several of these packages. Principal components analysis The ggfortify package defines both fortify() and autoplot() methods for the two core PCA functions in the stats package: stats::prcomp() and stats::princomp(). The values returned by either function can be passed directly to ggplot2::autoplot() as illustrated in the following code and in Figure 1. Note that users can also specify a column to be used for the colour aesthetic. library(ggfortify) df <iris[c(1, 2, 3, 4)] autoplot(prcomp(df), data = iris, colour = "Species") If label = TRUE is specified, as shown in Figure 2, ggfortify will draw labels for each data point. Users can also specify the size of the labels via label.size. If shape = FALSE is specified, the shape of the data points will be removed, leaving only the labels on the plot. autoplot(prcomp(df), data = iris, colour = "Species", shape = FALSE, label.size = 3) 2https://github.com/sinhrks/ggfortify/issues The R Journal Vol. 8/2, December 2016 ISSN 2073-4859 CONTRIBUTED RESEARCH ARTICLES 477 Figure 1: PCA with colors for each class. Figure 2: PCA with colors and labels for each class. The R Journal Vol. 8/2, December 2016 ISSN 2073-4859 CONTRIBUTED RESEARCH ARTICLES 478 The autoplot function returns the constructed ggplot2 object so users can apply additional ggplot2 code to further enhance the plot. For example: autoplot(prcomp(df), data=iris, colour = "Species", shape = FALSE, label.size = 3) + labs(title = "Principal Component Analysis") Users can also specify loadings = TRUE to draw the PCA eigen-vectors. More aesthetic options such as size and colors of the eigen-vector labels can also be specified as shown in Figure 3 and the following code: autoplot(prcomp(df), data = iris, colour = "Species", loadings = TRUE, loadings.colour = 'blue', loadings.label = TRUE, loadings.label.size = 3) Figure 3: PCA with eigen-vectors and labels. Linear models The ggfortify function is able able to interpret lm() fitted model objects and allows the user to select the subset of desired plots through the which parameter (just like the plot.lm() function). The ncol and nrow parameters also allow users to specify the number of subplot columns and rows, as seen in Figure 4 and the following code: par(mfrow = c(1, 2)) m <lm(Petal.Width ~ Petal.Length, data = iris) autoplot(m, which = 1:6, ncol = 3, label.size = 3) Many plot aesthetics can be changed by using the appropriate named parameters. For example, the colour parameter is for coloring data points, the smooth.colour p