Large interactive visualization of density functions on big data infrastructure

Point set visualization is required in lots of visualization techniques. Scatter plots as well as geographic heat-maps are straightforward examples. Data analysts are now well trained to use such visualization techniques. The availability of larger and larger datasets raises the need to make these techniques scale as fast as the data grows. The Big Data Infrastructure offers the possibility to scale horizontally. Designing point set visualization methods that fit into that new paradigm is thus a crucial challenge. In this paper, we present a complete architecture which fully fits into the Big Data paradigm and so enables interactive visualization of heatmaps at ultra-scale. A new distributed algorithm for multi-scale aggregation of point set is given and an adaptive GPU based method for kernel density estimation is proposed. A complete prototype working with Hadoop, HBase, Spark and WebGL has been implemented. We give a benchmark of our solution on a dataset having more than 2 billion points.

[1]  David Auber,et al.  Interactive refinement of multi-scale network clusterings , 2005, Ninth International Conference on Information Visualisation (IV'05).

[2]  Michael Gleicher,et al.  Splatterplots: Overcoming Overdraw in Scatter Plots , 2013, IEEE Transactions on Visualization and Computer Graphics.

[3]  Darius Miniotas,et al.  Visualization of eye gaze data using heat maps , 2007 .

[4]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[5]  Frank Heidmann,et al.  Heattile, a New Method for Heatmap Implementations for Mobile Web-Based Cartographic Applications , 2014 .

[6]  Richard A. Davis,et al.  Remarks on Some Nonparametric Estimates of a Density Function , 2011 .

[7]  Konstantinos G. Margaritis,et al.  Accelerating Kernel Density Estimation on the GPU Using the CUDA Framework , 2013 .

[8]  Kun Zhou,et al.  Visual Abstraction and Exploration of Multi-class Scatterplots , 2014, IEEE Transactions on Visualization and Computer Graphics.

[9]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[10]  Szymon Łukasik,et al.  Parallel Computing of Kernel Density Estimates with MPI , 2007 .

[11]  James Abello,et al.  ASK-GraphView: A Large Scale Graph Visualization System , 2006, IEEE Transactions on Visualization and Computer Graphics.

[12]  Christophe Hurter,et al.  Graph Bundling by Kernel Density Estimation , 2012, Comput. Graph. Forum.

[13]  Feifei Li,et al.  Quality and efficiency for kernel density estimates in large data , 2013, SIGMOD '13.

[14]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[15]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[16]  Michael Goldberg The Packing of Equal Circles in a Square , 1970 .

[17]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[18]  Ahmed Eldawy,et al.  SHAHED: A MapReduce-based system for querying and visualizing spatio-temporal satellite data , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[19]  Valerio Pascucci,et al.  Parallel visualization on large clusters using MapReduce , 2011, 2011 IEEE Symposium on Large Data Analysis and Visualization.

[20]  Konstantinos G. Margaritis,et al.  Parallel Computing of Kernel Density Estimation with Different Multi-core Programming Models , 2013, 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[21]  Edward Cutrell,et al.  What are you looking for?: an eye-tracking study of information usage in web search , 2007, CHI.

[22]  Robert van Liere,et al.  GraphSplatting: Visualizing Graphs as Continuous Fields , 2003, IEEE Trans. Vis. Comput. Graph..

[23]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[24]  Helwig Hauser,et al.  Interactive visualization of streaming data with Kernel Density Estimation , 2011, 2011 IEEE Pacific Visualization Symposium.

[25]  Jure Leskovec,et al.  Friendship and mobility: user movement in location-based social networks , 2011, KDD.

[26]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[27]  Masafumi Hagiwara,et al.  Online Geovisualization with Fast Kernel Density Estimator , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[28]  Markus Hadwiger,et al.  Interactive Volume Exploration of Petascale Microscopy Data Streams Using a Visualization-Driven Virtual Memory Approach , 2012, IEEE Transactions on Visualization and Computer Graphics.

[29]  Yu Han,et al.  Interactive visualization of high density streaming points with heat-map , 2014, 2014 International Conference on Smart Computing.

[30]  Ulrik Brandes,et al.  Interactive Level-of-Detail Rendering of Large Graphs , 2012, IEEE Transactions on Visualization and Computer Graphics.

[31]  Jean-Yves Delort,et al.  Vizualizing Large Spatial Datasets in Interactive Maps , 2010, 2010 Second International Conference on Advanced Geographic Information Systems, Applications, and Services.

[32]  Danyel Fisher,et al.  Hotmap: Looking at Geographic Attention , 2007, IEEE Transactions on Visualization and Computer Graphics.

[33]  Oliver Hohlfeld,et al.  Impact of frame rate and resolution on objective QoE metrics , 2010, 2010 Second International Workshop on Quality of Multimedia Experience (QoMEX).

[34]  Jeffrey Heer,et al.  imMens: Real‐time Visual Querying of Big Data , 2013, Comput. Graph. Forum.

[35]  Carlos Eduardo Scheidegger,et al.  An Algebraic Process for Visualization Design , 2014, IEEE Transactions on Visualization and Computer Graphics.

[36]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[37]  Jean-Daniel Fekete,et al.  Hierarchical Aggregation for Information Visualization: Overview, Techniques, and Design Guidelines , 2010, IEEE Transactions on Visualization and Computer Graphics.

[38]  Andrew McCallum,et al.  Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.