XCluSim: a visual analytics tool for interactively comparing multiple clustering results of bioinformatics data

BackgroundThough cluster analysis has become a routine analytic task for bioinformatics research, it is still arduous for researchers to assess the quality of a clustering result. To select the best clustering method and its parameters for a dataset, researchers have to run multiple clustering algorithms and compare them. However, such a comparison task with multiple clustering results is cognitively demanding and laborious.ResultsIn this paper, we present XCluSim, a visual analytics tool that enables users to interactively compare multiple clustering results based on the Visual Information Seeking Mantra. We build a taxonomy for categorizing existing techniques of clustering results visualization in terms of the Gestalt principles of grouping. Using the taxonomy, we choose the most appropriate interactive visualizations for presenting individual clustering results from different types of clustering algorithms. The efficacy of XCluSim is shown through case studies with a bioinformatician.ConclusionsCompared to other relevant tools, XCluSim enables users to compare multiple clustering results in a more scalable manner. Moreover, XCluSim supports diverse clustering algorithms and dedicated visualizations and interactions for different types of clustering results, allowing more effective exploration of details on demand. Through case studies with a bioinformatics researcher, we received positive feedback on the functionalities of XCluSim, including its ability to help identify stably clustered items across multiple clustering results.

[1]  D. Botstein,et al.  DNA microarray analysis of gene expression in response to physiological and genetic changes that affect tryptophan metabolism in Escherichia coli. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Edwin de Jonge,et al.  Tree Colors: Color Schemes for Tree-Structured Data , 2014, IEEE Transactions on Visualization and Computer Graphics.

[3]  Christian Posse,et al.  Diverse information integration and visualization , 2006, Electronic Imaging.

[4]  Dino Pedreschi,et al.  Interactive visual clustering of large collections of trajectories , 2009, 2009 IEEE Symposium on Visual Analytics Science and Technology.

[5]  Eduard Gröller,et al.  Cupid: Cluster-Based Exploration of Geometry Generators with Parallel Coordinates and Radial Trees , 2014, IEEE Transactions on Visualization and Computer Graphics.

[6]  Yong-Joon Cho,et al.  A defect in iron uptake enhances the susceptibility of Cryptococcus neoformans to azole antifungal drugs. , 2012, Fungal genetics and biology : FG & B.

[7]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[8]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[9]  Alfred Inselberg,et al.  Parallel coordinates: a tool for visualizing multi-dimensional geometry , 1990, Proceedings of the First IEEE Conference on Visualization: Visualization `90.

[10]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[11]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[12]  Colin Ware,et al.  Information Visualization: Perception for Design , 2000 .

[13]  Doris Dransch,et al.  Visual Analytics for Comparison of Ocean Model Output with Reference Data: Detecting and Analyzing Geophysical Processes Using Clustering Ensembles , 2014, IEEE Transactions on Visualization and Computer Graphics.

[14]  M. Sheelagh T. Carpendale,et al.  Bubble Sets: Revealing Set Relations with Isocontours over Existing Visualizations , 2009, IEEE Transactions on Visualization and Computer Graphics.

[15]  M. Shahriar Hossain,et al.  Scatter/Gather Clustering: Flexibly Incorporating User Feedback to Steer Clustering Results , 2012, IEEE Transactions on Visualization and Computer Graphics.

[16]  Yifan Hu,et al.  GMap: Visualizing graphs and clusters as maps , 2010, 2010 IEEE Pacific Visualization Symposium (PacificVis).

[17]  C. J. van Rijsbergen,et al.  FOUNDATION OF EVALUATION , 1974 .

[18]  Georges G. Grinstein,et al.  Visually comparing multiple partitions of data with applications to clustering , 2009, Electronic Imaging.

[19]  R. Kosara,et al.  Parallel sets: visual analysis of categorical data , 2005, IEEE Symposium on Information Visualization, 2005. INFOVIS 2005..

[20]  Helwig Hauser,et al.  Parallel Sets: interactive exploration and visual analysis of categorical data , 2006, IEEE Transactions on Visualization and Computer Graphics.

[21]  Chao Wang,et al.  iGPSe: A visual analytic system for integrative genomic based cancer patient stratification , 2014, BMC Bioinformatics.

[22]  Alfred Inselberg,et al.  Parallel coordinates for visualizing multi-dimensional geometry , 1987 .

[23]  Tamara Munzner,et al.  A Taxonomy of Visual Cluster Separation Factors , 2012, Comput. Graph. Forum.

[24]  Ben Shneiderman,et al.  Interactively Exploring Hierarchical Clustering Results , 2002, Computer.

[25]  Georges G. Grinstein,et al.  Heat Map Visualizations Allow Comparison of Multiple Clustering Results and Evaluation of Dataset Quality: Application to Microarray Data , 2007, 2007 11th International Conference Information Visualization (IV '07).

[26]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[27]  Dieter Schmalstieg,et al.  VisBricks: Multiform Visualization of Large, Inhomogeneous Data , 2011, IEEE Transactions on Visualization and Computer Graphics.

[28]  Dieter Schmalstieg,et al.  Comparative Analysis of Multidimensional, Quantitative Data , 2010, IEEE Transactions on Visualization and Computer Graphics.

[29]  Heike Hofmann,et al.  Common Angle Plots as Perception-True Visualizations of Categorical Associations , 2013, IEEE Transactions on Visualization and Computer Graphics.

[30]  Antony Unwin,et al.  Comparing Clusterings Using Bertin's Idea , 2012, IEEE Transactions on Visualization and Computer Graphics.

[31]  Eser Kandogan,et al.  Just-in-time annotation of clusters, outliers, and trends in point-based data visualizations , 2012, 2012 IEEE Conference on Visual Analytics Science and Technology (VAST).

[32]  Peter Bak,et al.  Visual Analytics for Spatial Clustering: Using a Heuristic Approach for Guided Exploration , 2013, IEEE Transactions on Visualization and Computer Graphics.

[33]  Hongan Wang,et al.  Visualization of large hierarchical data by circle packing , 2006, CHI.

[34]  Dieter Schmalstieg,et al.  StratomeX: Visual Analysis of Large‐Scale Heterogeneous Genomics Data for Cancer Subtype Characterization , 2012, Comput. Graph. Forum.

[35]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Dieter Schmalstieg,et al.  Caleydo: Design and evaluation of a visual analysis framework for gene expression data in its biological context , 2010, 2010 IEEE Pacific Visualization Symposium (PacificVis).

[37]  William M. Shaw,et al.  On the foundation of evaluation , 1986, J. Am. Soc. Inf. Sci..