Integrating Data Clustering and Visualization for the Analysis of 3D Gene Expression Data

The recent development of methods for extracting precise measurements of spatial gene expression patterns from three-dimensional (3D) image data opens the way for new analyses of the complex gene regulatory networks controlling animal development. We present an integrated visualization and analysis framework that supports user-guided data clustering to aid exploration of these new complex data sets. The interplay of data visualization and clustering-based data classification leads to improved visualization and enables a more detailed analysis than previously possible. We discuss 1) the integration of data clustering and visualization into one framework, 2) the application of data clustering to 3D gene expression data, 3) the evaluation of the number of clusters k in the context of 3D gene expression clustering, and 4) the improvement of overall analysis quality via dedicated postprocessing of clustering results based on visualization. We discuss the use of this framework to objectively define spatial pattern boundaries and temporal profiles of genes and to analyze how mRNA patterns are controlled by their regulatory transcription factors.

[1]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[2]  André Hardy,et al.  An examination of procedures for determining the number of clusters in a data set , 1994 .

[3]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Chris Henze Feature detection in linked derived spaces , 1998, Proceedings Visualization '98 (Cat. No.98CB36276).

[5]  Matthew O. Ward,et al.  Hierarchical parallel coordinates for exploration of large datasets , 1999, Proceedings Visualization '99 (Cat. No.99CB37067).

[6]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[7]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Bernice E. Rogowitz,et al.  WEAVE: a system for visually linking 3-D and statistical visualizations, applied to cardiac simulation and measurement data , 2000 .

[9]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Allison Woodruff,et al.  Guidelines for using multiple views in information visualization , 2000, AVI '00.

[11]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[12]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[13]  Z. Yakhini,et al.  Overabundance Analysis and Class Discovery in Gene Expression Data , 2001 .

[14]  D. Thieffry,et al.  A logical analysis of the Drosophila gap-gene system. , 2001, Journal of theoretical biology.

[15]  Ka Yee Yeung,et al.  Validating clustering for gene expression data , 2001, Bioinform..

[16]  Paul L. Rosin Unimodal thresholding , 2001, Pattern Recognit..

[17]  Yaniv Ziv,et al.  Revealing modular organization in the yeast transcriptional network , 2002, Nature Genetics.

[18]  A I Saeed,et al.  TM4: a free, open-source system for microarray data management and analysis. , 2003, BioTechniques.

[19]  Werner Dubitzky,et al.  A Practical Approach to Microarray Data Analysis , 2003, Springer US.

[20]  Helwig Hauser,et al.  Interactive Feature Specification for Focus+Context Visualization of Complex Simulation Data , 2003, VisSym.

[21]  Susmita Datta,et al.  Comparisons and validation of statistical clustering techniques for microarray gene expression data , 2003, Bioinform..

[22]  Luis Mateus Rocha,et al.  Singular value decomposition and principal component analysis , 2003 .

[23]  H. Othmer,et al.  The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in Drosophila melanogaster. , 2003, Journal of theoretical biology.

[24]  Jill P. Mesirov,et al.  GeneCluster 2.0: an advanced toolset for bioarray analysis , 2004, Bioinform..

[25]  Helwig Hauser,et al.  Linking Scientific and Information Visualization with Interactive 3D Scatterplots , 2004, WSCG.

[26]  David H. Sharp,et al.  Dynamic control of positional information in the early Drosophila embryo , 2004, Nature.

[27]  H. Hauser,et al.  Interactive focus+context visualization with linked 2D/3D scatterplots , 2004, Proceedings. Second International Conference on Coordinated and Multiple Views in Exploratory Visualization, 2004..

[28]  Satoru Miyano,et al.  Open source clustering software , 2004 .

[29]  Matthias Hemmje,et al.  From Integrated Publication and Information Systems to Information and Knowledge Environments , 2005 .

[30]  Douglas B. Kell,et al.  Computational cluster validation in post-genomic data analysis , 2005, Bioinform..

[31]  Ben Shneiderman,et al.  A Knowledge Integration Framework for Information Visualization , 2005, From Integrated Publication and Information Systems to Virtual Information and Knowledge Environments.

[32]  Jitendra Malik,et al.  Registering Drosophila embryos at cellular resolution to build a quantitative 3D atlas of gene expression patterns and morphology , 2005, 2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05).

[33]  Ming-Syan Chen,et al.  Dual Clustering: Integrating Data Clustering over Optimization and Constraint Domains , 2005, IEEE Trans. Knowl. Data Eng..

[34]  Charless C. Fowlkes,et al.  Three-dimensional morphology and gene expression in the Drosophila blastoderm at cellular resolution II: dynamics , 2006, Genome Biology.

[35]  Charless C. Fowlkes,et al.  Three-dimensional morphology and gene expression in the Drosophila blastoderm at cellular resolution I: data acquisition pipeline , 2006, Genome Biology.

[36]  Jitendra Malik,et al.  PointCloudXplore: Visual Analysis of 3D Gene Expression Data Using Physical Views and Parallel Coordinates , 2006, EuroVis.

[37]  Charless C. Fowlkes,et al.  Constructing a Quantitative Spatio-temporal Atlas of Gene Expression in the Drosophila Blastoderm , 2007 .

[38]  Charless C. Fowlkes,et al.  A Quantitative Spatiotemporal Atlas of Gene Expression in the Drosophila Blastoderm , 2008, Cell.

[39]  B. Hamann,et al.  Visual Exploration of Three-Dimensional Gene Expression Using Physical Views and Linked Abstract Views , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.