StratomeX: Visual Analysis of Large‐Scale Heterogeneous Genomics Data for Cancer Subtype Characterization

Identification and characterization of cancer subtypes are important areas of research that are based on the integrated analysis of multiple heterogeneous genomics datasets. Since there are no tools supporting this process, much of this work is done using ad‐hoc scripts and static plots, which is inefficient and limits visual exploration of the data. To address this, we have developed StratomeX, an integrative visualization tool that allows investigators to explore the relationships of candidate subtypes across multiple genomic data types such as gene expression, DNA methylation, or copy number data. StratomeX represents datasets as columns and subtypes as bricks in these columns. Ribbons between the columns connect bricks to show subtype relationships across datasets. Drill‐down features enable detailed exploration. StratomeX provides insights into the functional and clinical implications of candidate subtypes by employing small multiples, which allow investigators to assess the effect of subtypes on molecular pathways or outcomes such as patient survival. As the configuration of viewing parameters in such a multi‐dataset, multi‐view scenario is complex, we propose a meta visualization and configuration interface for dataset dependencies and data‐view relationships. StratomeX is developed in close collaboration with domain experts. We describe case studies that illustrate how investigators used the tool to explore subtypes in large datasets and demonstrate how they efficiently replicated findings from the literature and gained new insights into the data.

[1]  Joshua M. Korn,et al.  Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2008, Nature.

[2]  Christian Posse,et al.  Diverse information integration and visualization , 2006, Electronic Imaging.

[3]  Helwig Hauser,et al.  Parallel Sets: interactive exploration and visual analysis of categorical data , 2006, IEEE Transactions on Visualization and Computer Graphics.

[4]  Matthew A. Hibbs,et al.  Visualization of omics data for systems biology , 2010, Nature Methods.

[5]  Georges G. Grinstein,et al.  Visually comparing multiple partitions of data with applications to clustering , 2009, Electronic Imaging.

[6]  S. Gabriel,et al.  Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. , 2010, Cancer cell.

[7]  Robert Kosara Turning a Table into a Tree: Growing Parallel Sets into a Purposeful Project , 2010, Beautiful Visualization.

[8]  Michael Friendly,et al.  Extending Mosaic Displays: Marginal, Conditional, and Partial Views of Categorical Data , 1999 .

[9]  Jimmy Johansson,et al.  A Task Based Performance Evaluation of Visualization Approaches for Categorical Data Analysis , 2011, 2011 15th International Conference on Information Visualisation.

[10]  Chris North,et al.  Visualization schemas for flexible information visualization , 2002, IEEE Symposium on Information Visualization, 2002. INFOVIS 2002..

[11]  Bang Wong,et al.  Pathline: A Tool For Comparative Functional Genomics , 2010, Comput. Graph. Forum.

[12]  Rudolf Jaenisch,et al.  Single-gene transgenic mouse strains for reprogramming adult somatic cells , 2010, Nature Methods.

[13]  Matthew O. Ward,et al.  Mapping Nominal Values to Numbers for Effective Visualization , 2004, Inf. Vis..

[14]  T. J. Watson,et al.  Ordering Categorical Data to Improve VisualizationSheng , 1999 .

[15]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[16]  Sheng Ma,et al.  Fast ordering of large categorical datasets for better visualization , 2001, KDD '01.

[17]  Falk Schreiber,et al.  Creating views on integrated multidomain data , 2011, Bioinform..

[18]  H. Hofmann Mosaic Plots and Their Variants , 2008 .

[19]  J. Neely,et al.  A practical guide to understanding Kaplan-Meier curves , 2010, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[20]  Heidrun Schumann,et al.  Model-Driven Design for the Visual Analysis of Heterogeneous Data , 2012, IEEE Transactions on Visualization and Computer Graphics.

[21]  Ben Shneiderman,et al.  Interactive Entity Resolution in Relational Data: A Visual Analytic Tool and Its Evaluation , 2008, IEEE Transactions on Visualization and Computer Graphics.

[22]  Michael Friendly,et al.  Effect ordering for data displays , 2003, Comput. Stat. Data Anal..

[23]  Yoshihiro Yamanishi,et al.  KEGG for linking genomes to life and the environment , 2007, Nucleic Acids Res..

[24]  Dieter Schmalstieg,et al.  VisBricks: Multiform Visualization of Large, Inhomogeneous Data , 2011, IEEE Transactions on Visualization and Computer Graphics.

[25]  Dieter Schmalstieg,et al.  Comparative Analysis of Multidimensional, Quantitative Data , 2010, IEEE Transactions on Visualization and Computer Graphics.

[26]  James P. Ahrens,et al.  Multi-Source Data Analysis Challenges , 1998, Proceedings Visualization '98 (Cat. No.98CB36276).

[27]  Matthew O. Ward,et al.  Mapping Nominal Values to Numbers for Effective Visualization , 2003, IEEE Symposium on Information Visualization 2003 (IEEE Cat. No.03TH8714).

[28]  R. Arceci Identification of a CpG Island Methylator Phenotype that Defines a Distinct Subgroup of Glioma , 2010 .

[29]  Helwig Hauser,et al.  Integrating cluster formation and cluster evaluation in interactive visual analysis , 2011, SCC.

[30]  H. Hofmann Exploring categorical data: interactive mosaic plots , 2000 .