Exploratory and inferential analysis of gene cluster neighborhood graphs

BackgroundMany different cluster methods are frequently used in gene expression data analysis to find groups of co-expressed genes. However, cluster algorithms with the ability to visualize the resulting clusters are usually preferred. The visualization of gene clusters gives practitioners an understanding of the cluster structure of their data and makes it easier to interpret the cluster results.ResultsIn this paper recent extensions of R package gcExplorer are presented. gcExplorer is an interactive visualization toolbox for the investigation of the overall cluster structure as well as single clusters. The different visualization options including arbitrary node and panel functions are described in detail. Finally the toolbox can be used to investigate the quality of a given clustering graphically as well as theoretically by testing the association between a partition and a functional group under study.ConclusionIt is shown that gcExplorer is a very helpful tool for a general exploration of microarray experiments. The identification of potentially interesting gene candidates or functional groups is substantially accelerated and eased. Inferential analysis on a cluster solution is used to judge its ability to provide insight into the underlying mechanistic biology of the experiment.

[1]  Alfredo Rizzi,et al.  The chi-square test when the expected frequencies are less than 5 , 2006 .

[2]  Markus J. Herrgård,et al.  Integrating high-throughput and computational data elucidates bacterial networks , 2004, Nature.

[3]  Dennis B. Troup,et al.  NCBI GEO: mining tens of millions of expression profiles—database and tools update , 2006, Nucleic Acids Res..

[4]  Heather J. Ruskin,et al.  Techniques for clustering gene expression data , 2008, Comput. Biol. Medicine.

[5]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[6]  Friedrich Leisch,et al.  A toolbox for K-centroids cluster analysis , 2006 .

[7]  Thomas Martinetz,et al.  Topology representing networks , 1994, Neural Networks.

[8]  Vincent J. Carey,et al.  Bioconductor Software for Graphs , 2005 .

[9]  Paul Murrell,et al.  R Graphics , 2006, Computer science and data analysis series.

[10]  Robert Castelo,et al.  Reverse Engineering Molecular Regulatory Networks from Microarray Data with qp-Graphs , 2009, J. Comput. Biol..

[11]  K. Hornik,et al.  Residual-Based Shadings for Visualizing (Conditional) Independence , 2007 .

[12]  Gerald Striedner,et al.  Interactive visualization of clusters in microarray data: an efficient tool for improved metabolic analysis of E. coli , 2009, Microbial cell factories.

[13]  Rafael A. Irizarry,et al.  Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005 .

[14]  Joaquín Dopazo,et al.  Data Analysis and Visualization in Genomics and Proteomics , 2005 .

[15]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[16]  I. Androulakis,et al.  Analysis of time-series gene expression data: methods, challenges, and opportunities. , 2007, Annual review of biomedical engineering.

[17]  Julio Collado-Vides,et al.  RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions , 2005, Nucleic Acids Res..

[18]  Laurie J. Heyer,et al.  Exploring expression data: identification and analysis of coexpressed genes. , 1999, Genome research.

[19]  Friedrich Leisch,et al.  Visualizing gene clusters using neighborhood graphs in R , 2008 .

[20]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[21]  Gerald Striedner,et al.  Tuning the Transcription Rate of Recombinant Protein in Strong Escherichiacoli Expression Systems through Repressor Titration , 2003, Biotechnology progress.

[22]  Friedrich Leisch,et al.  gcExplorer: interactive exploration of gene clusters , 2009, Bioinform..

[23]  Angel R. Martinez,et al.  Computational Statistics Handbook with MATLAB, Second Edition (Chapman & Hall/Crc Computer Science & Data Analysis) , 2007 .

[24]  Monica Riley,et al.  GenProtEC: an updated and improved analysis of functions of Escherichia coli K-12 proteins , 2004, Nucleic Acids Res..

[25]  Friedrich Leisch,et al.  The Stochastic QT-Clust Algorithm: Evaluation of Stability and Variance on Time-Course Microarray Data , 2006 .

[26]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[27]  Kathleen Marchal,et al.  Advances in Cluster Analysis of Microarray Data , 2005, Data Analysis and Visualization in Genomics and Proteomics.