Markov Chain Driven Multi-Dimensional Visual Pattern Analysis with Parallel Coordinates

Parallel coordinates is a widely used visualization technique for presenting, analyzing and exploring multidimensional data. However, like many other visualizations, it can suffer from an overplotting problem when rendering large data sets. Until now, quite a few methods are proposed to discover and illustrate the major data trends in cluttered parallel coordinates. Among them, frequency-based approaches using binning and histograms are widely adopted. The traditional binning method, which records line-segment frequency, only considers data in a two-dimensional subspace, as a result, the multi-dimensional features are not taken into account for trend and outlier analysis. Obtaining a coherent binned representation in higher dimensions is challenging because multidimensional binning can suffer from the curse of dimensionality. In this paper, we utilize the Markov Chain model to compute an n-dimensional joint probability for each data tuple based on a two-dimensional binning method. This probability value can be utilized to guide the user for selection and brushing. We provide various interaction techniques for the user to control the parameters during the brushing process. Filtered data with a high probability measure often explicitly illustrates major data trends. In order to scale to large data sets, we also propose a more precise angular representation for angular histograms to depict the density of the brushed data trends. We demonstrate our methods and evaluate the results on a wide variety of data sets, including real-world, high-dimensional biological data.

[1]  Haim Levkowitz,et al.  Uncovering Clusters in Crowded Parallel Coordinates Visualizations , 2004, IEEE Symposium on Information Visualization.

[2]  Edward J. Wegman,et al.  High Dimensional Clustering Using Parallel Coordinates and the Grand Tour , 1997 .

[3]  Helwig Hauser,et al.  Outlier-Preserving Focus+Context Visualization in Parallel Coordinates , 2006, IEEE Transactions on Visualization and Computer Graphics.

[4]  Hans Hagen,et al.  High performance multivariate visual data exploration for extremely large data , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[5]  David Feng,et al.  Matching Visual Saliency to Confidence in Plots of Uncertain Data , 2010, IEEE Transactions on Visualization and Computer Graphics.

[6]  Charl P. Botha,et al.  Extensions of Parallel Coordinates for Interactive Exploration of Large Multi-Timepoint Data Sets , 2008, IEEE Transactions on Visualization and Computer Graphics.

[7]  E. Wegman Hyperdimensional Data Analysis Using Parallel Coordinates , 1990 .

[8]  Jonathan C. Roberts,et al.  Angular Histograms: Frequency-Based Visualizations for Large, High Dimensional Data , 2011, IEEE Transactions on Visualization and Computer Graphics.

[9]  José Fernando Rodrigues,et al.  Frequency plot and relevance plot to enhance visual data exploration , 2003, 16th Brazilian Symposium on Computer Graphics and Image Processing (SIBGRAPI 2003).

[10]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[11]  Matthew O. Ward,et al.  Hierarchical parallel coordinates for exploration of large datasets , 1999, Proceedings Visualization '99 (Cat. No.99CB37067).

[12]  Alfred Inselberg,et al.  Parallel coordinates: a tool for visualizing multi-dimensional geometry , 1990, Proceedings of the First IEEE Conference on Visualization: Visualization `90.

[13]  Robert S. Laramee,et al.  Visualisation of Sensor Data from Animal Movement , 2009, Comput. Graph. Forum.

[14]  Emily L. C. Shepard,et al.  Flexible paddle sheds new light on speed: a novel method for the remote measurement of swim speed in aquatic animals , 2008 .

[15]  Daniel B. Carr,et al.  Looking at large data sets using binned data plots , 1992 .