Data Visualization and Statistical Graphics in Big Data Analysis

This article discusses the role of data visualization in the process of analyzing big data. We describe the historical origins of statistical graphics, from the birth of exploratory data analysis to the impacts of statistical graphics on practice today. We present examples of contemporary data visualizations in the process of exploring airline traffic, global standardized test scores, election monitoring, Wikipedia edits, the housing crisis as observed in San Francisco, and the mining of credit card databases. We provide a review of recent literature. Good data visualization yields better models and predictions and allows for the discovery of the unexpected.

[1]  Kenneth E. Shirley,et al.  LDAvis: A method for visualizing and interpreting topics , 2014 .

[2]  Jeffrey Heer,et al.  SpanningAspectRatioBank Easing FunctionS ArrayIn ColorIn Date Interpolator MatrixInterpola NumObjecPointI Rectang ISchedu Parallel Pause Scheduler Sequen Transition Transitioner Transiti Tween Co DelimGraphMLCon IData JSONCon DataField DataSc Dat DataSource Data DataUtil DirtySprite LineS RectSprite , 2011 .

[3]  R. Grossman,et al.  Graph-theoretic scagnostics , 2005, IEEE Symposium on Information Visualization, 2005. INFOVIS 2005..

[4]  Rick Wicklin,et al.  Visualizing Airline Delays and Cancelations , 2011 .

[5]  Heike Hofmann,et al.  Visualizing statistical models: Removing the blindfold , 2015, Stat. Anal. Data Min..

[6]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[7]  Daniel B. Carr,et al.  Scatterplot matrix techniques for large N , 1986 .

[8]  David J. Phillips,et al.  A Graphical Tool to Visualize Predicted Minimum Delay Flights , 2011 .

[9]  Andee Kaplan,et al.  Can You Buy a President? Politics After the Tillman Act , 2014 .

[10]  Jj Allaire,et al.  HTML Widgets for R , 2015 .

[11]  Antony Unwin,et al.  Infovis and Statistical Graphics: Different Goals, Different Looks , 2013 .

[12]  Lars Linsen,et al.  Visualizing high density clusters in multidimensional data using optimized star coordinates , 2011, Comput. Stat..

[13]  Marius Hofert,et al.  A Graphical Goodness-of-Fit Test for Dependence Models in Higher Dimensions , 2014 .

[14]  Heike Hofmann,et al.  An algorithm for deciding the number of clusters and validation using simulated data with application to exploring crop population structure , 2013, 1401.1608.

[15]  Duncan Temple Lang,et al.  An introduction to rggobi , 2006 .

[16]  Daniel Asimov,et al.  The grand tour: a tool for viewing multidimensional data , 1985 .

[17]  William S. Cleveland,et al.  Visualizing Data , 1993 .

[18]  J. Hartigan Printer graphics for clustering , 1975 .

[19]  E. Wegman Hyperdimensional Data Analysis Using Parallel Coordinates , 1990 .

[20]  Ryan Hafen,et al.  Visualization Databases for the Analysis of Large Complex Datasets , 2009, AISTATS.

[21]  Deborah F. Swayne,et al.  Statistical inference for exploratory data analysis and model diagnostics , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[22]  Adam Loy,et al.  Delayed, Canceled, on Time, Boarding… Flying in the USA , 2011 .

[23]  Carlos Eduardo Scheidegger,et al.  Nanocubes for Real-Time Exploration of Spatiotemporal Datasets , 2013, IEEE Transactions on Visualization and Computer Graphics.

[24]  Heike Hofmann,et al.  Reactive Programming for Interactive Graphics , 2014, ArXiv.

[25]  Rob J Hyndman,et al.  Rainbow Plots, Bagplots, and Boxplots for Functional Data , 2010 .

[26]  John T. Stasko,et al.  OnSet: A Visualization Technique for Large-scale Binary Set Data , 2014, IEEE Transactions on Visualization and Computer Graphics.

[27]  Jürgen Symanzik,et al.  Multi-Class Data Exploration Using Space Transformed Visualization Plots , 2011 .

[28]  Jerome H. Friedman,et al.  John W. Tukey's work on interactive graphics , 2002 .

[29]  Catherine B. Hurley,et al.  Eulerian tour algorithms for data visualization and the PairViz package , 2011, Comput. Stat..

[30]  Julie Steele,et al.  Beautiful Visualization - Looking at Data Through the Eyes of Experts , 2010, Beautiful Visualization.

[31]  Dianne Cook,et al.  The Generalized Pairs Plot , 2013 .

[32]  Alfred Inselberg,et al.  The plane with parallel coordinates , 1985, The Visual Computer.

[33]  Michael Friendly Comment on “The Generalized Pairs Plot” , 2014 .

[34]  Héctor Corrada Bravo,et al.  Epiviz: interactive visual analytics for functional genomics data , 2014, Nature Methods.

[35]  Chandler Stolp,et al.  The Visual Display of Quantitative Information , 1983 .

[36]  Heike Hofmann,et al.  Visually Monitoring the 2008 Election , 2010 .

[37]  Martin Wattenberg,et al.  Visualizing Activity on Wikipedia with Chromograms , 2007, INTERACT.

[38]  Heike Hofmann,et al.  Validation of Visual Statistical Inference, Applied to Linear Models , 2013 .

[39]  Adrian Baddeley,et al.  Residual Diagnostics for Covariate Effects in Spatial Point Process Models , 2013 .

[40]  Heike Hofmann,et al.  Graphical Tests for Power Comparison of Competing Designs , 2012, IEEE Transactions on Visualization and Computer Graphics.

[41]  Niall M. Adams,et al.  Data Mining for Fun and Profit , 2000 .

[42]  Charlotte Wickham A Tale of Two Airports: Exploring Flight Traffic at SFO and OAK , 2011 .

[43]  Leland Wilkinson,et al.  ScagExplorer: Exploring Scatterplots by Their Scagnostics , 2014, 2014 IEEE Pacific Visualization Symposium.

[44]  Heike Hofmann,et al.  Common Angle Plots as Perception-True Visualizations of Categorical Associations , 2013, IEEE Transactions on Visualization and Computer Graphics.