Mr. Plotter: Unifying Data Reduction Techniques in Storage and Visualization Systems

As the rate of data collection continues to grow rapidly, developing visualization tools that scale to immense data sets is a serious and ever-increasing challenge. Existing approaches generally seek to decouple storage and visualization systems, performing just-in-time data reduction to transparently avoid overloading the visualizer. We present a new architecture in which the visualizer and data store are tightly coupled. Unlike systems that read raw data from storage, the performance of our system scales linearly with the size of the final visualization, essentially independent of the size of the data. Thus, it scales to massive data sets while supporting interactive performance (sub-100 ms query latency). This enables a new class of visualization clients that automatically manage data, quickly and transparently requesting data from the underlying database without requiring the user to explicitly initiate queries. It lays a groundwork for supporting truly interactive exploration of big data and opens new directions for research on scalable information visualization systems.

[1]  Ben Shneiderman,et al.  Visual information seeking: tight coupling of dynamic query filters with starfield displays , 1994, CHI '94.

[2]  Peter Bailis,et al.  Prioritizing Attention in Analytic Monitoring , 2017, CIDR.

[3]  Ben Shneiderman,et al.  Extreme visualization: squeezing a billion records into a million pixels , 2008, SIGMOD Conference.

[4]  Jeffrey Heer,et al.  imMens: Real‐time Visual Querying of Big Data , 2013, Comput. Graph. Forum.

[5]  Michael Stonebraker,et al.  Dynamic Prefetching of Data Tiles for Interactive Visualization , 2016, SIGMOD Conference.

[6]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[7]  David E. Culler,et al.  BTrDB: Optimizing Storage System Design for Timeseries Processing , 2016, FAST.

[8]  Monica M. C. Schraefel,et al.  Trust me, i'm partially right: incremental visualization lets analysts explore large datasets faster , 2012, CHI.

[9]  Matthew O. Ward,et al.  Measuring Data Abstraction Quality in Multiresolution Visualizations , 2006, IEEE Transactions on Visualization and Computer Graphics.

[10]  Volker Markl,et al.  M4: A Visualization-Oriented Time Series Data Aggregation , 2014, Proc. VLDB Endow..

[11]  Ben Shneiderman,et al.  Interactive pattern search in time series , 2005, IS&T/SPIE Electronic Imaging.

[12]  David E. Culler,et al.  DISTIL: Design and implementation of a scalable synchrophasor data processing system , 2015, 2015 IEEE International Conference on Smart Grid Communications (SmartGridComm).

[13]  Ben Shneiderman,et al.  Dynamic queries for visual information seeking , 1994, IEEE Software.

[14]  HeerJeffrey,et al.  D3 Data-Driven Documents , 2011 .

[15]  Ben Shneiderman,et al.  Dynamic Query Tools for Time Series Data Sets: Timebox Widgets for Interactive Exploration , 2004, Inf. Vis..

[16]  Pat Hanrahan,et al.  Polaris: A System for Query, Analysis, and Visualization of Multidimensional Relational Databases , 2002, IEEE Trans. Vis. Comput. Graph..

[17]  Michael Stonebraker,et al.  Dynamic reduction of query result sets for interactive visualizaton , 2013, 2013 IEEE International Conference on Big Data.

[18]  Pat Hanrahan,et al.  Maintaining interactivity while exploring massive time series , 2008, 2008 IEEE Symposium on Visual Analytics Science and Technology.

[19]  Michael Stonebraker,et al.  A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[20]  David Ellsworth,et al.  Managing Big Data for Scientific Visualization , 2015 .

[21]  Volker Markl,et al.  VDDA: automatic visualization-driven data aggregation in relational databases , 2016, The VLDB Journal.

[22]  Samuel Madden,et al.  MacroBase: Prioritizing Attention in Fast Data , 2016, SIGMOD Conference.

[23]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[24]  Anthony Rowe,et al.  Respawn: A Distributed Multi-resolution Time-Series Datastore , 2013, 2013 IEEE 34th Real-Time Systems Symposium.

[25]  Peter J. Haas,et al.  Interactive data Analysis: The Control Project , 1999, Computer.

[26]  Michael J. Cafarella,et al.  Visualization-aware sampling for very large databases , 2015, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[27]  Stephen Curial,et al.  Effectively visualizing large networks through sampling , 2005, VIS 05. IEEE Visualization, 2005..

[28]  Peter Bailis,et al.  ASAP: Prioritizing Attention via Time Series Smoothing , 2017, Proc. VLDB Endow..

[29]  Ion Stoica,et al.  BlinkDB: queries with bounded errors and bounded response times on very large data , 2012, EuroSys '13.

[30]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[31]  Jean-Daniel Fekete,et al.  Hierarchical Aggregation for Information Visualization: Overview, Techniques, and Design Guidelines , 2010, IEEE Transactions on Visualization and Computer Graphics.

[32]  George Papastefanatos,et al.  A hierarchical aggregation framework for efficient multilevel visual exploration and analysis , 2015, Semantic Web.

[33]  Jarek Gryz,et al.  Interactive Visualization of Big Data , 2015, BDAS.

[34]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[35]  Pawel Terlecki,et al.  An analytic data engine for visualization in tableau , 2011, SIGMOD '11.