Position statement: The case for a visualization performance benchmark

Visualizations are an invaluable tool in the data analysis process, as they enable scientists to explore and interpret billions of datapoints quickly, and with just a few rendered images. However, many visualization systems are unable to keep up with the unprecedented accumulation of data through remote sensors, field sensors, medical and personal devices, social networks, and more. This is due to certain assumptions that many of these tools rely on, such as the assumption that these systems can store entire datasets directly in main memory. With so many datasets massive datasets available, ranging from the NASA MODIS satellite imagery dataset[3] to the Internet Movie Database [4] to Twitter streams [1], this assumption no longer matches reality.

[1]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[2]  Arnab Nandi,et al.  Distributed and interactive cube exploration , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[3]  Monica M. C. Schraefel,et al.  Trust me, i'm partially right: incremental visualization lets analysts explore large datasets faster , 2012, CHI.

[4]  Pat Hanrahan,et al.  Maintaining interactivity while exploring massive time series , 2008, 2008 IEEE Symposium on Visual Analytics Science and Technology.

[5]  Kanit Wongsuphasawat,et al.  Voyager: Exploratory Analysis via Faceted Browsing of Visualization Recommendations , 2016, IEEE Transactions on Visualization and Computer Graphics.

[6]  Mark A. Whiting,et al.  Threat stream data generator: creating the known unknowns for test and evaluation of visual analytics tools , 2006, BELIV '06.

[7]  Michael Stonebraker,et al.  GenBase: a complex analytics genomics benchmark , 2014, SIGMOD Conference.

[8]  Dominik Moritz,et al.  What Users Don't Expect about Exploratory Data Analysis on Approximate Query Processing Systems , 2017, HILDA@SIGMOD.

[9]  Raghunath Othayoth Nambiar,et al.  Transaction Processing Performance Council (TPC): State of the Council 2010 , 2010, TPCTC.

[10]  Stanley B. Zdonik,et al.  Query Steering for Interactive Data Exploration , 2013, CIDR.

[11]  Jean-Daniel Fekete,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS , 2022 .

[12]  Carlos Eduardo Scheidegger,et al.  Hashedcubes: Simple, Low Memory, Real-Time Visual Exploration of Big Data , 2017, IEEE Transactions on Visualization and Computer Graphics.

[13]  Jeffrey Heer,et al.  imMens: Real‐time Visual Querying of Big Data , 2013, Comput. Graph. Forum.

[14]  L. Toledo-Pereyra Trust , 2006, Mediation Behaviour.

[15]  Alex Endert,et al.  Finding Waldo: Learning about Users from their Interactions , 2014, IEEE Transactions on Visualization and Computer Graphics.

[16]  Ippokratis Pandis,et al.  TPC-E vs. TPC-C: characterizing the new TPC-E benchmark via an I/O comparison study , 2011, SGMD.

[17]  Leilani Battle,et al.  Behavior-driven optimization techniques for scalable data exploration , 2017 .

[18]  Michael Stonebraker,et al.  Dynamic Prefetching of Data Tiles for Interactive Visualization , 2016, SIGMOD Conference.

[19]  David H. Laidlaw,et al.  A Case Study Using Visualization Interaction Logs and Insight Metrics to Understand How Analysts Arrive at Insights , 2016, IEEE Transactions on Visualization and Computer Graphics.

[20]  Jeffrey Heer,et al.  Profiler: integrated statistical analysis and visualization for data quality assessment , 2012, AVI.

[21]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[22]  H. Reza Taheri,et al.  TPC-V: A Benchmark for Evaluating the Performance of Database Applications in Virtual Environments , 2010, TPCTC.

[23]  Carlos Eduardo Scheidegger,et al.  Nanocubes for Real-Time Exploration of Spatiotemporal Datasets , 2013, IEEE Transactions on Visualization and Computer Graphics.

[24]  Michael J. Cafarella,et al.  Visualization-aware sampling for very large databases , 2015, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[25]  Carsten Binnig,et al.  Towards a Benchmark for Interactive Data Exploration , 2016, IEEE Data Eng. Bull..

[26]  Michael Stonebraker,et al.  E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing , 2014, Proc. VLDB Endow..

[27]  Aditya G. Parameswaran,et al.  SeeDB: Efficient Data-Driven Visualization Recommendations to Support Visual Analytics , 2015, Proc. VLDB Endow..

[28]  Kanit Wongsuphasawat,et al.  Voyager 2: Augmenting Visual Analysis with Partial View Specifications , 2017, CHI.

[29]  Carsten Binnig,et al.  The case for interactive data exploration accelerators (IDEAs) , 2016, HILDA '16.