Visualizing and Inspecting Large Datasets with Tableplots

More and more researchers study large data sources. Solely through their size alone, getting insight into the data in these sources is dicult. A visualization method, commonly referred to as a tableplot, was found extremely useful for this purpose. A tableplot is a method that is able to display the aggregated distribution patterns of a dozen of variables in one single gure. We demonstrate that information on data quality and the presence and selectivity of missing data is obtained. In our opinion, the tableplot is an very valuable addition to the standard set of statistical tools commonly used for data exploration, processing, and analysis. A tool to create tableplots has been implemented as a package for the open source statistical software environment R and made publically available.

[1]  Cynthia A. Brewer,et al.  ColorBrewer in Print: A Catalog of Color Schemes for Maps , 2003 .

[2]  Heike Hofmann,et al.  Graphics of Large Datasets: Visualizing a Million , 2006 .

[3]  Peter Fox,et al.  Changing the Equation on Scientific Data Visualization , 2011, Science.

[4]  J. A. Hartigan,et al.  Mosaics for Contingency Tables , 1981 .

[5]  The Dutch virtual Census 2001: A new approach by combining different sources , 2005 .

[6]  Antony Unwin,et al.  Graphics of a Large Dataset , 2006 .

[7]  Transfer Registers ECONOMIC COMMISSION FOR EUROPE COMMITTEE ON ENVIRONMENTAL POLICY , 2002 .

[8]  Eric Schulte Nordholt,et al.  Research on the Quality of Registers to Make Data Decisions in the Dutch Virtual Census , 2012 .

[9]  Sander Scholtus,et al.  Handbook of Statistical Data Editing and Imputation , 2011 .

[10]  Felice C. Frankel,et al.  Big data: Distilling meaning from data , 2008, Nature.

[11]  Rob J Hyndman,et al.  Estimating and Visualizing Conditional Densities , 1996 .

[12]  Michael A. Martin Statistical Graphics , 2011 .

[13]  Waqas Ahmed Malik,et al.  An Interactive Graphical System for Visualizing Data Quality–Tableplot Graphics , 2010 .

[14]  English Only Economic Commission for Europe , 2012 .

[15]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[16]  Ramana Rao,et al.  The table lens: merging graphical and symbolic representations in an interactive focus + context visualization for tabular information , 1994, CHI '94.

[17]  Jeroen Pannekoek Research on edit and imputation methodology: the throughput programme , 2009 .