Marviq: Quality-Aware Geospatial Visualization of Range-Selection Queries Using Materialization

We study the problem of efficient spatial visualization on a large data set stored in a database using SQL queries with ad-hoc range conditions on numerical attributes, for example, a spatial scatterplot of taxi pickup events in New York between 1/1/2015 and 3/10/2015. We present a novel middleware-based technique called Marviq. It divides the selection-attribute domain into intervals, and precomputes and stores a visualization for each interval. These results are called MVS and stored as tables in the database. We can compute an exact visualization for a request by accessing MVS and retrieving additional records from the base table. To further reduce the latter time, we present algorithms for using MVS to compute an approximate visualization that satisfies a user-specified similarity threshold. We show a family of functions with certain properties that can use this technique. We present an improvement by dividing the MVS intervals into smaller intervals and materializing low-resolution visualization for these intervals. We report the results of an extensive evaluation of Marviq, including a user study, and show its high performance in both space and time.

[1]  Barzan Mozafari,et al.  VerdictDB: Universalizing Approximate Query Processing , 2018, SIGMOD Conference.

[2]  Thrasyvoulos N. Pappas,et al.  Perceptual criteria for image quality evaluation , 2005 .

[3]  Arnab Nandi,et al.  Distributed and interactive cube exploration , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[4]  Michael J. McGuffin,et al.  VisReduce: Fast and responsive incremental information visualization of large datasets , 2013, 2013 IEEE International Conference on Big Data.

[5]  Carsten Binnig,et al.  The case for interactive data exploration accelerators (IDEAs) , 2016, HILDA '16.

[6]  Guoliang Li,et al.  Approximate Query Processing: What is New and Where to Go? , 2018, Data Science and Engineering.

[7]  Monica M. C. Schraefel,et al.  Trust me, i'm partially right: incremental visualization lets analysts explore large datasets faster , 2012, CHI.

[8]  Ronitt Rubinfeld,et al.  I've Seen "Enough": Incrementally Improving Visualizations to Support Rapid Decision Making , 2017, Proc. VLDB Endow..

[9]  Tim Kraska,et al.  Northstar: An Interactive Data Science System , 2018, Proc. VLDB Endow..

[10]  Matthew O. Ward,et al.  XmdvtoolQ:: quality-aware interactive data exploration , 2007, SIGMOD '07.

[11]  Jia Yu,et al.  Hippo in Action: Scalable Indexing of a Billion New York City Taxi Trips and Beyond , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[12]  Alexandr Andoni,et al.  Interacting with Large Distributed Datasets Using Sketch , 2016, EGPGV@EuroVis.

[13]  Arnab Nandi,et al.  Evaluating Interactive Data Systems: Workloads, Metrics, and Guidelines , 2018, SIGMOD Conference.

[14]  Zhifeng Bao,et al.  Efficient Selection of Geospatial Data on Maps for Interactive and Visualized Exploration , 2018, SIGMOD Conference.

[15]  Aditya G. Parameswaran,et al.  The Case for a Visual Discovery Assistant: A Holistic Solution for Accelerating Visual Data Exploration , 2018, IEEE Data Eng. Bull..

[16]  Michael Stonebraker,et al.  Kyrix: Interactive Pan/Zoom Visualizations at Scale , 2019, Comput. Graph. Forum.

[17]  Pat Hanrahan,et al.  Maintaining interactivity while exploring massive time series , 2008, 2008 IEEE Symposium on Visual Analytics Science and Technology.

[18]  Ion Stoica,et al.  G-OLA: Generalized On-Line Aggregation for Interactive Analysis on Big Data , 2015, SIGMOD Conference.

[19]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[20]  Zhe Wang,et al.  Gaussian Cubes: Real-Time Modeling for Visual Exploration of Large Multidimensional Datasets , 2017, IEEE Transactions on Visualization and Computer Graphics.

[21]  Jeffrey Heer,et al.  The Effects of Interactive Latency on Exploratory Visual Analysis , 2014, IEEE Transactions on Visualization and Computer Graphics.

[22]  Barzan Mozafari,et al.  SnappyData: A Unified Cluster for Streaming, Transactions and Interactice Analytics , 2017, CIDR.

[23]  Carlos Eduardo Scheidegger,et al.  Hashedcubes: Simple, Low Memory, Real-Time Visual Exploration of Big Data , 2017, IEEE Transactions on Visualization and Computer Graphics.

[24]  Carsten Binnig,et al.  Vizdom: Interactive Analytics through Pen and Touch , 2015, Proc. VLDB Endow..

[25]  Jeffrey Heer,et al.  imMens: Real‐time Visual Querying of Big Data , 2013, Comput. Graph. Forum.

[26]  Jian Pei,et al.  AQP++: Connecting Approximate Query Processing With Aggregate Precomputation for Interactive Analytics , 2018, SIGMOD Conference.

[27]  Shouling Ji,et al.  Sapprox: Enabling Efficient and Accurate Approximations on Sub-datasets with Distribution-aware Online Sampling , 2016, Proc. VLDB Endow..

[28]  Ahmed Eldawy,et al.  HadoopViz: A MapReduce framework for extensible visualization of big spatial data , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[29]  Daniel Cheng,et al.  Tile based visual analytics for Twitter big data exploratory analysis , 2013, 2013 IEEE International Conference on Big Data.

[30]  Jeffrey Heer,et al.  Falcon: Balancing Interactive Latency and Resolution Sensitivity for Scalable Linked Visualizations , 2019, CHI.

[31]  Fotis Psallidas,et al.  Provenance for Interactive Visualizations , 2018, HILDA@SIGMOD.

[32]  Carsten Binnig,et al.  Revisiting Reuse for Approximate Query Processing , 2017, Proc. VLDB Endow..

[33]  Ion Stoica,et al.  BlinkDB: queries with bounded errors and bounded response times on very large data , 2012, EuroSys '13.

[34]  Jarek Gryz,et al.  Interactive Visualization of Large Data Sets , 2016, IEEE Transactions on Knowledge and Data Engineering.

[35]  Michael Stonebraker,et al.  Dynamic Prefetching of Data Tiles for Interactive Visualization , 2016, SIGMOD Conference.

[36]  Bolin Ding,et al.  Trust, but Verify: Optimistic Visualizations of Approximate Queries for Exploring Big Data , 2017, CHI.

[37]  Surajit Chaudhuri,et al.  Sample + Seek: Approximating Aggregates with Distribution Precision Guarantee , 2016, SIGMOD Conference.

[38]  Michael J. Cafarella,et al.  Visualization-aware sampling for very large databases , 2015, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[39]  Marcos K. Aguilera,et al.  Hillview: A trillion-cell spreadsheet for big data , 2019, Proc. VLDB Endow..

[40]  Carlos Eduardo Scheidegger,et al.  Nanocubes for Real-Time Exploration of Spatiotemporal Datasets , 2013, IEEE Transactions on Visualization and Computer Graphics.

[41]  Giuseppe Santucci,et al.  Give Chance a Chance: Modeling Density to Enhance Scatter Plot Quality through Random Data Sampling , 2006, Inf. Vis..

[42]  Jia Yu,et al.  GeoSparkViz: a scalable geospatial data visualization framework in the apache spark ecosystem , 2018, SSDBM.

[43]  Bart Preneel,et al.  A Secure Perceptual Hash Algorithm for Image Content Authentication , 2011, Communications and Multimedia Security.

[44]  Lu Wang,et al.  Spatial Online Sampling and Aggregation , 2015, Proc. VLDB Endow..

[45]  Thu D. Nguyen,et al.  ApproxHadoop: Bringing Approximations to MapReduce Frameworks , 2015, ASPLOS.