Integrating Data Clustering and Visualization for the Analysis of 3D Gene Expression Data

The recent development of methods for extracting precise measurements of spatial gene expression patterns from threedimensional (3D) image data opens the way for new analyses of the complex gene regulatory networks controlling animal development. We present an integrated visualization and analysis framework that supports user-guided data clustering to aid exploration of these new complex data sets. The interplay of data visualization and clustering-based data classification leads to improved visualization and enables a more detailed analysis than previously possible. We discuss 1) the integration of data clustering and visualization into one framework, 2) the application of data clustering to 3D gene expression data, 3) the evaluation of the number of clusters k in the context of 3D gene expression clustering, and 4) the improvement of overall analysis quality via dedicated postprocessing of clustering results based on visualization. We discuss the use of this framework to objectively define spatial pattern boundaries and temporal profiles of genes and to analyze how mRNA patterns are controlled by their regulatory transcription factors.

[1]  Helwig Hauser,et al.  Interactive Feature Specification for Focus+Context Visualization of Complex Simulation Data , 2003, VisSym.

[2]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[3]  Matthew O. Ward,et al.  Hierarchical parallel coordinates for exploration of large datasets , 1999, Proceedings Visualization '99 (Cat. No.99CB37067).

[4]  Ben Shneiderman,et al.  A Knowledge Integration Framework for Information Visualization , 2005, From Integrated Publication and Information Systems to Virtual Information and Knowledge Environments.

[5]  Paul L. Rosin Unimodal thresholding , 2001, Pattern Recognit..

[6]  Jitendra Malik,et al.  PointCloudXplore: Visual Analysis of 3D Gene Expression Data Using Physical Views and Parallel Coordinates , 2006, EuroVis.

[7]  Ming-Syan Chen,et al.  Dual Clustering: Integrating Data Clustering over Optimization and Constraint Domains , 2005, IEEE Trans. Knowl. Data Eng..

[8]  Charless C. Fowlkes,et al.  Three-dimensional morphology and gene expression in the Drosophila blastoderm at cellular resolution II: dynamics , 2006, Genome Biology.

[9]  H. Othmer,et al.  The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in Drosophila melanogaster. , 2003, Journal of theoretical biology.

[10]  D. Thieffry,et al.  A logical analysis of the Drosophila gap-gene system. , 2001, Journal of theoretical biology.

[11]  H. Hauser,et al.  Interactive focus+context visualization with linked 2D/3D scatterplots , 2004, Proceedings. Second International Conference on Coordinated and Multiple Views in Exploratory Visualization, 2004..

[12]  Helwig Hauser,et al.  Linking Scientific and Information Visualization with Interactive 3D Scatterplots , 2004, WSCG.

[13]  Susmita Datta,et al.  Comparisons and validation of statistical clustering techniques for microarray gene expression data , 2003, Bioinform..

[14]  Allison Woodruff,et al.  Guidelines for using multiple views in information visualization , 2000, AVI '00.

[15]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[17]  Satoru Miyano,et al.  Open source clustering software , 2004 .

[18]  Jitendra Malik,et al.  Registering Drosophila embryos at cellular resolution to build a quantitative 3D atlas of gene expression patterns and morphology , 2005, 2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05).

[19]  Douglas B. Kell,et al.  Computational cluster validation in post-genomic data analysis , 2005, Bioinform..

[20]  Ka Yee Yeung,et al.  Validating clustering for gene expression data , 2001, Bioinform..

[21]  Yaniv Ziv,et al.  Revealing modular organization in the yeast transcriptional network , 2002, Nature Genetics.

[22]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Chris Henze Feature detection in linked derived spaces , 1998, Proceedings Visualization '98 (Cat. No.98CB36276).

[24]  B. Hamann,et al.  Visual Exploration of Three-Dimensional Gene Expression Using Physical Views and Linked Abstract Views , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[25]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Z. Yakhini,et al.  Overabundance Analysis and Class Discovery in Gene Expression Data , 2001 .

[27]  Jill P. Mesirov,et al.  GeneCluster 2.0: an advanced toolset for bioarray analysis , 2004, Bioinform..

[28]  André Hardy,et al.  An examination of procedures for determining the number of clusters in a data set , 1994 .

[29]  Charless C. Fowlkes,et al.  Constructing a Quantitative Spatio-temporal Atlas of Gene Expression in the Drosophila Blastoderm , 2007 .

[30]  A I Saeed,et al.  TM4: a free, open-source system for microarray data management and analysis. , 2003, BioTechniques.

[31]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[32]  David H. Sharp,et al.  Dynamic control of positional information in the early Drosophila embryo , 2004, Nature.

[33]  Charless C. Fowlkes,et al.  Three-dimensional morphology and gene expression in the Drosophila blastoderm at cellular resolution I: data acquisition pipeline , 2006, Genome Biology.

[34]  Bernice E. Rogowitz,et al.  WEAVE: a system for visually linking 3-D and statistical visualizations, applied to cardiac simulation and measurement data , 2000 .