High performance multivariate visual data exploration for extremely large data

One of the central challenges in modern science is the need to quickly derive knowledge and understanding from large, complex collections of data. We present a new approach that deals with this challenge by combining and extending techniques from high performance visual data analysis and scientific data management. This approach is demonstrated within the context of gaining insight from complex, time-varying datasets produced by a laser wakefield accelerator simulation. Our approach leverages histogram-based parallel coordinates for both visual information display as well as a vehicle for guiding a data mining operation. Data extraction and subsetting are implemented with state-of-the-art index/query technology. This approach, while applied here to accelerator science, is generally applicable to a broad set of science applications, and is implemented in a production-quality visual data analysis infrastructure. We conduct a detailed performance analysis and demonstrate good scalability on a distributed memory Cray XT4 system.

[1]  Patrick E. O'Neil,et al.  Model 204 Architecture and Performance , 1987, HPTS.

[2]  Arie Shoshani,et al.  Using bitmap index for interactive exploration of large datasets , 2003, 15th International Conference on Scientific and Statistical Database Management, 2003..

[3]  James P. Ahrens,et al.  An application architecture for large data visualization: a case study , 2001, Proceedings IEEE 2001 Symposium on Parallel and Large-Data Visualization and Graphics (Cat. No.01EX520).

[4]  Nelson L. Max,et al.  A contract based system for large data visualization , 2005, VIS 05. IEEE Visualization, 2005..

[5]  E. Wegman Hyperdimensional Data Analysis Using Parallel Coordinates , 1990 .

[6]  John Shalf,et al.  Query-driven visualization of large data sets , 2005, VIS 05. IEEE Visualization, 2005..

[7]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[8]  C. Nieter,et al.  Laser Guiding at Relativistic Intensities and Wake-field Particle Acceleration in Plasma Channels , 2005, 2005 Quantum Electronics and Laser Science Conference.

[9]  Sudipto Guha,et al.  REHIST: Relative Error Histogram Construction Algorithms , 2004, VLDB.

[10]  Arie Shoshani,et al.  Breaking the Curse of Cardinality on Bitmap Indexes , 2008, SSDBM.

[11]  Yannis E. Ioannidis,et al.  Bitmap index design and evaluation , 1998, SIGMOD '98.

[12]  Arie Shoshani,et al.  Optimizing bitmap indices with efficient compression , 2006, TODS.

[13]  K. Stockinger,et al.  Detecting Distributed Scans Using High-Performance Query-Driven Visualization , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[14]  Arie Shoshani,et al.  Compressing bitmap indexes for faster search operations , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[15]  M. Cooper,et al.  Revealing structure within clustered parallel coordinates displays , 2005, IEEE Symposium on Information Visualization, 2005. INFOVIS 2005..

[16]  Arie Shoshani,et al.  Strategies for processing ad hoc queries on large data warehouses , 2002, DOLAP '02.

[17]  Matthew O. Ward,et al.  Hierarchical parallel coordinates for exploration of large datasets , 1999, Proceedings Visualization '99 (Cat. No.99CB37067).

[18]  John Shalf,et al.  HDF5-FastQuery: Accelerating Complex Queries on HDF Datasets using Fast Bitmap Indices , 2005, 18th International Conference on Scientific and Statistical Database Management (SSDBM'06).

[19]  Matej Novotny,et al.  Visually Effective Information Visualization of Large Data , 2004 .

[20]  J. Cary,et al.  VORPAL: a versatile plasma simulation code , 2004 .

[21]  Arie Shoshani,et al.  On the performance of bitmap indices for high cardinality attributes , 2004, VLDB.

[22]  Kesheng Wu,et al.  Accelerating Network Traffic Analytics Using Query-Driven Visualization , 2006, 2006 IEEE Symposium On Visual Analytics Science And Technology.

[23]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[24]  Helwig Hauser,et al.  Outlier-Preserving Focus+Context Visualization in Parallel Coordinates , 2006, IEEE Transactions on Visualization and Computer Graphics.

[25]  C. Geddes Plasma channel guided laser wakefield accelerator , 2005 .

[26]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[27]  J. Cary,et al.  High-quality electron beams from a laser wakefield accelerator using plasma-channel guiding , 2004, Nature.