Towards insight-driven sampling for big data visualisation

ABSTRACT Creating an interactive, accurate, and low-latency big data visualisation is challenging due to the volume, variety, and velocity of the data. Visualisation options range from visualising the entire big dataset, which could take a long time and be taxing to the system, to visualising a small subset of the dataset, which could be fast and less taxing to the system but could also lead to a less-beneficial visualisation as a result of information loss. The main research questions investigated by this work are what effect sampling has on visualisation insight and how to provide guidance to users in navigating this trade-off. To investigate these issues, we study an initial case of simple estimation tasks on histogram visualisations of sampled big data, in hopes that these results may generalise. Leveraging sampling, we generate subsets of large datasets and create visualisations for a crowd-sourced study involving a simple cognitive visualisation task. Using the results of this study, we quantify insight, sampling, visualisation, and perception error in comparison to the full dataset. We use these results to model the relationship between sample size and insight error, and we propose the use of our model to guide big data visualisation sampling.

[1]  Nicholas F. Polys,et al.  Making Sense of Scientific Simulation Ensembles With Semantic Interaction , 2020, Comput. Graph. Forum.

[2]  Carsten Binnig,et al.  Model-based Approximate Query Processing , 2018, ArXiv.

[3]  Luca Benini,et al.  Pricing schemes for energy-efficient HPC systems: Design and exploration , 2018, Int. J. High Perform. Comput. Appl..

[4]  Aditya G. Parameswaran,et al.  Adaptive Sampling for Rapidly Matching Histograms , 2017, Proc. VLDB Endow..

[5]  Dongmei Zhang,et al.  BigIN4: Instant, Interactive Insight Identification for Multi-Dimensional Big Data , 2018, KDD.

[6]  Bolin Ding,et al.  Trust, but Verify: Optimistic Visualizations of Approximate Queries for Exploring Big Data , 2017, CHI.

[7]  Andrey Gubarev,et al.  Dremel : Interactive Analysis of Web-Scale Datasets , 2011 .

[8]  Ying Zhao,et al.  An information-aware visualization for privacy-preserving accelerometer data sharing , 2018, Human-centric Computing and Information Sciences.

[9]  Andrew Mercer,et al.  Uncertainty-Aware Multidimensional Ensemble Data Visualization and Exploration , 2015, IEEE Transactions on Visualization and Computer Graphics.

[10]  Wu Feng,et al.  A Pipeline for Large Data Processing Using Regular Sampling for Unstructured Grids , 2017 .

[11]  Alex T. Pang,et al.  Approaches to uncertainty visualization , 1996, The Visual Computer.

[12]  Daniel Weiskopf,et al.  Bubble Treemaps for Uncertainty Visualization , 2018, IEEE Transactions on Visualization and Computer Graphics.

[13]  Andreas Holzinger,et al.  Human-Computer Interaction and Knowledge Discovery (HCI-KDD): What Is the Benefit of Bringing Those Two Fields to Work Together? , 2013, CD-ARES.

[14]  Daniel A. Keim,et al.  The Role of Uncertainty, Awareness, and Trust in Visual Analytics , 2016, IEEE Transactions on Visualization and Computer Graphics.

[15]  Tim Kraska,et al.  How Progressive Visualizations Affect Exploratory Analysis , 2017, IEEE Transactions on Visualization and Computer Graphics.

[16]  Steven G. Gilmour,et al.  The Interpretation of Mallows's CP‐Statistic , 1996 .

[17]  Ben Shneiderman,et al.  Using vision to think , 1999 .

[18]  Jun Zhang,et al.  Visualization of big data security: a case study on the KDD99 cup data set , 2017 .

[19]  Stephanie Rosenthal,et al.  Sampling techniques to improve big data exploration , 2017, 2017 IEEE 7th Symposium on Large Data Analysis and Visualization (LDAV).

[20]  Wu-chun Feng,et al.  On the Greenness of In-Situ and Post-Processing Visualization Pipelines , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[21]  Insu Song,et al.  Centrality clustering-based sampling for big data visualization , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[22]  Jean-Daniel Fekete ProgressiVis: a Toolkit for Steerable Progressive Analytics and Visualization , 2015 .

[23]  William Ribarsky,et al.  Defining Insight for Visual Analytics , 2009, IEEE Computer Graphics and Applications.

[24]  Carsten Binnig,et al.  Revisiting Reuse for Approximate Query Processing , 2017, Proc. VLDB Endow..

[25]  John T. Stasko,et al.  Understanding and characterizing insights: how do people gain insights using information visualization? , 2008, BELIV.

[26]  Tony Doyle,et al.  Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy , 2017, Inf. Soc..

[27]  Michael J. Cafarella,et al.  Visualization-aware sampling for very large databases , 2015, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[28]  J. Alberto Espinosa,et al.  Big Data: Issues and Challenges Moving Forward , 2013, 2013 46th Hawaii International Conference on System Sciences.

[29]  Bongshin Lee,et al.  Characterizing Visualization Insights from Quantified Selfers' Personal Data Presentations , 2015, IEEE Computer Graphics and Applications.

[30]  Donald H. House,et al.  Uncertainty Visualization by Representative Sampling from Prediction Ensembles , 2017, IEEE Transactions on Visualization and Computer Graphics.

[31]  Monica M. C. Schraefel,et al.  Trust me, i'm partially right: incremental visualization lets analysts explore large datasets faster , 2012, CHI.

[32]  Helwig Hauser,et al.  Designing Progressive and Interactive Analytics Processes for High-Dimensional Data Analysis , 2017, IEEE Transactions on Visualization and Computer Graphics.

[33]  Jeffrey Heer,et al.  imMens: Real‐time Visual Querying of Big Data , 2013, Comput. Graph. Forum.

[34]  Chris North,et al.  An Evaluation of Microarray Visualization Tools for Biological Insight , 2004, IEEE Symposium on Information Visualization.

[35]  Cheryl Ann Alexander,et al.  Big Data and Visualization: Methods, Challenges and Technology Progress , 2015 .

[36]  Chris North,et al.  Toward measuring visualization insight , 2006, IEEE Computer Graphics and Applications.

[37]  Y. Loscalzo,et al.  Development of a new measure for assessing insight: Psychometric properties of the insight orientation scale (IOS) , 2015, Schizophrenia Research.