Visual Analysis of Large Multivariate Scattered Data using Clustering and Probabilistic Summaries

Rapidly growing data sizes of scientific simulations pose significant challenges for interactive visualization and analysis techniques. In this work, we propose a compact probabilistic representation to interactively visualize large scattered datasets. In contrast to previous approaches that represent blocks of volumetric data using probability distributions, we model clusters of arbitrarily structured multivariate data. In detail, we discuss how to efficiently represent and store a high-dimensional distribution for each cluster. We observe that it suffices to consider low-dimensional marginal distributions for two or three data dimensions at a time to employ common visual analysis techniques. Based on this observation, we represent high-dimensional distributions by combinations of low-dimensional Gaussian mixture models. We discuss the application of common interactive visual analysis techniques to this representation. In particular, we investigate several frequency-based views, such as density plots in 1D and 2D, density-based parallel coordinates, and a time histogram. We visualize the uncertainty introduced by the representation, discuss a level-of-detail mechanism, and explicitly visualize outliers. Furthermore, we propose a spatial visualization by splatting anisotropic 3D Gaussians for which we derive a closed-form solution. Lastly, we describe the application of brushing and linking to this clustered representation. Our evaluation on several large, real-world datasets demonstrates the scaling of our approach.

[1]  Hans Hagen,et al.  Hierarchical clustering for unstructured volumetric scalar fields , 2003, IEEE Visualization, 2003. VIS 2003..

[2]  Jiayi Xu,et al.  Distribution-based Particle Data Reduction for In-situ Analysis and Visualization of Large-scale N-body Cosmological Simulations , 2020, 2020 IEEE Pacific Visualization Symposium (PacificVis).

[3]  Ben Shneiderman,et al.  Dynamic Query Tools for Time Series Data Sets: Timebox Widgets for Interactive Exploration , 2004, Inf. Vis..

[4]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[5]  Haim Levkowitz,et al.  Uncovering Clusters in Crowded Parallel Coordinates Visualizations , 2004, IEEE Symposium on Information Visualization.

[6]  Bernard W. Silverman,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[7]  Charl P. Botha,et al.  Extensions of Parallel Coordinates for Interactive Exploration of Large Multi-Timepoint Data Sets , 2008, IEEE Transactions on Visualization and Computer Graphics.

[8]  Daniel Weiskopf,et al.  Continuous Scatterplots , 2008, IEEE Transactions on Visualization and Computer Graphics.

[9]  Rainer Koch,et al.  SPH simulation of a twin fluid atomizer operating with a high viscosity liquid , 2015 .

[10]  Valerio Pascucci,et al.  Analysis of large-scale scalar data using hixels , 2011, 2011 IEEE Symposium on Large Data Analysis and Visualization.

[11]  Daniel Weiskopf,et al.  Progressive Splatting of Continuous Scatterplots and Parallel Coordinates , 2011, Comput. Graph. Forum.

[12]  Han-Wei Shen,et al.  CoDDA: A Flexible Copula-based Distribution Driven Analysis Framework for Large-Scale Multivariate Data , 2019, IEEE Transactions on Visualization and Computer Graphics.

[13]  Han-Wei Shen,et al.  In Situ Distribution Guided Analysis and Visualization of Transonic Jet Engine Simulations , 2017, IEEE Transactions on Visualization and Computer Graphics.

[14]  Gunther H. Weber,et al.  Interactive Visual Exploration and Analysis , 2014, Scientific Visualization.

[15]  M. Cooper,et al.  Revealing structure within clustered parallel coordinates displays , 2005, IEEE Symposium on Information Visualization, 2005. INFOVIS 2005..

[16]  Kelly P. Gaither,et al.  RBF Volume Ray Casting on Multicore and Manycore CPUs , 2014, Comput. Graph. Forum.

[17]  David S. Ebert,et al.  Interactively visualizing procedurally encoded scalar fields , 2004, VISSYM'04.

[18]  Klaus Mueller,et al.  GPU-Accelerated Volume Splatting With Elliptical RBFs , 2006, EuroVis.

[19]  Reinhard Klein,et al.  Moment-Based Order-Independent Transparency , 2018, PACMCGIT.

[20]  Philipp Birken,et al.  Numerical Linear Algebra , 2011, Encyclopedia of Parallel Computing.

[21]  E. Wegman,et al.  Construction of line densities for parallel coordinate plots , 1992 .

[22]  Helwig Hauser,et al.  Time histograms for large, time-dependent data , 2004, VISSYM'04.

[23]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[24]  David S. Ebert,et al.  Enhancing the Interactive Visualization of Procedurally Encoded Multifield Data with Ellipsoidal Basis Functions , 2006, Comput. Graph. Forum.

[25]  Christoph Peters,et al.  Void-and-Cluster Sampling of Large Scattered Data and Trajectories , 2019, IEEE Transactions on Visualization and Computer Graphics.

[26]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[27]  James P. Ahrens,et al.  In‐situ Sampling of a Large‐Scale Particle Simulation for Interactive Visualization and Analysis , 2011, Comput. Graph. Forum.

[28]  Valerio Pascucci,et al.  Gaussian mixture model based volume visualization , 2012, IEEE Symposium on Large Data Analysis and Visualization (LDAV).

[29]  Alireza Entezari,et al.  A Statistical Direct Volume Rendering Framework for Visualization of Uncertain Data , 2017, IEEE Transactions on Visualization and Computer Graphics.

[30]  Helwig Hauser,et al.  Smooth Brushing for Focus+Context Visualization of Simulation Data in 3D , 2002, WSCG.

[31]  Han-Wei Shen,et al.  Uncertainty Visualization Using Copula-Based Analysis in Mixed Distribution Models , 2018, IEEE Transactions on Visualization and Computer Graphics.

[32]  Klaus Mueller,et al.  Constructing 3D Elliptical Gaussians for Irregular Data , 2009, Mathematical Foundations of Scientific Visualization, Computer Graphics, and Massive Data Exploration.

[33]  Helwig Hauser,et al.  Outlier-Preserving Focus+Context Visualization in Parallel Coordinates , 2006, IEEE Transactions on Visualization and Computer Graphics.

[34]  Han-Wei Shen,et al.  Statistical visualization and analysis of large data using a value-based spatial distribution , 2017, 2017 IEEE Pacific Visualization Symposium (PacificVis).

[35]  Han-Wei Shen,et al.  Statistical Super Resolution for Data Analysis and Visualization of Large Scale Cosmological Simulations , 2019, 2019 IEEE Pacific Visualization Symposium (PacificVis).

[36]  Gregory F. Snyder,et al.  The illustris simulation: Public data release , 2015, Astron. Comput..

[37]  David Feng,et al.  Matching Visual Saliency to Confidence in Plots of Uncertain Data , 2010, IEEE Transactions on Visualization and Computer Graphics.

[38]  Jens H. Krüger,et al.  Sparse PDF Volumes for Consistent Multi-Resolution Volume Rendering , 2014, IEEE Transactions on Visualization and Computer Graphics.

[39]  Yoav Zemel,et al.  Statistical Aspects of Wasserstein Distances , 2018, Annual Review of Statistics and Its Application.

[40]  Daniel Weiskopf,et al.  Continuous Parallel Coordinates , 2009, IEEE Transactions on Visualization and Computer Graphics.

[41]  Michael Gleicher,et al.  Splatterplots: Overcoming Overdraw in Scatter Plots , 2013, IEEE Transactions on Visualization and Computer Graphics.

[42]  Joe Michael Kniss,et al.  Multidimensional Transfer Functions for Interactive Volume Rendering , 2002, IEEE Trans. Vis. Comput. Graph..

[43]  V. Springel E pur si muove: Galilean-invariant cosmological hydrodynamical simulations on a moving mesh , 2009, 0901.4107.

[44]  James P. Ahrens,et al.  Homogeneity guided probabilistic data summaries for analysis and visualization of large-scale data sets , 2017, 2017 IEEE Pacific Visualization Symposium (PacificVis).

[45]  Amitabh Varshney,et al.  Modelling and Rendering Large Volume Data with Gaussian Radial Basis Functions , 2007 .

[46]  Peter Messmer,et al.  Efficient Particle Volume Splatting in a Ray Tracer , 2019 .

[47]  David S. Ebert,et al.  Hardware-assisted feature analysis and visualization of procedurally encoded multifield volumetric data , 2005, IEEE Computer Graphics and Applications.

[48]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[49]  Helwig Hauser,et al.  Interactive visualization of streaming data with Kernel Density Estimation , 2011, 2011 IEEE Pacific Visualization Symposium.

[50]  Matthias Zwicker,et al.  EWA volume splatting , 2001, Proceedings Visualization, 2001. VIS '01..