Adding Confidence to Gene Expression Clustering

It has been well established that gene expression data contain large amounts of random variation that affects both the analysis and the results of microarray experiments. Typically, microarray data are either tested for differential expression between conditions or grouped on the basis of profiles that are assessed temporally or across genetic or environmental conditions. While testing differential expression relies on levels of certainty to evaluate the relative worth of various analyses, cluster analysis is exploratory in nature and has not had the benefit of any judgment of statistical inference. By using a novel dissimilarity function to ascertain gene expression clusters and conditional randomization of the data space to illuminate distinctions between statistically significant clusters of gene expression patterns, we aim to provide a level of confidence to inferred clusters of gene expression data. We apply both permutation and convex hull approaches for randomization of the data space and show that both methods can provide an effective assessment of gene expression profiles whose coregulation is statistically different from that expected by random chance alone.

[1]  D. Botstein,et al.  Systematic changes in gene expression patterns following adaptive evolution in yeast. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[2]  A. Brøndsted An Introduction to Convex Polytopes , 1982 .

[3]  Richard M. Simon,et al.  Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data , 2002, Bioinform..

[4]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[5]  Rachel B. Brem,et al.  The landscape of genetic complexity across 5,700 gene expression traits in yeast. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[6]  J. Claverie,et al.  Large-scale statistical analyses of rice ESTs reveal correlated patterns of gene expression. , 1999, Genome research.

[7]  Rebecca W. Doerge,et al.  Old Methods for New Ideas: Genetic Dissection of the Determinants of Gene Expression Levels , 2005 .

[8]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[9]  R. Doerge,et al.  Permutation tests for multiple loci affecting a quantitative character. , 1996, Genetics.

[10]  Karen Schlauch,et al.  GeneX: An Open Source gene expression database and integrated tool set , 2001, IBM Syst. J..

[11]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[12]  R. A. Fisher,et al.  Design of Experiments , 1936 .

[13]  P. Good,et al.  Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses , 1995 .

[14]  Hongyu Zhao,et al.  Assessing reliability of gene clusters from gene expression data , 2000, Functional & Integrative Genomics.

[15]  R. Doerge,et al.  Empirical threshold values for quantitative trait mapping. , 1994, Genetics.

[16]  George M Church,et al.  A microarray-based antibiotic screen identifies a regulatory role for supercoiling in the osmotic stress response of Escherichia coli. , 2003, Genome research.

[17]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Rebecca W. Doerge,et al.  Gene expression data: The technology and statistical analysis , 2003 .

[19]  R. Stoughton,et al.  Genetics of gene expression surveyed in maize, mouse and man , 2003, Nature.

[20]  D. Botstein,et al.  The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.

[21]  Christian A. Rees,et al.  Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[22]  L. Kruglyak,et al.  Simultaneous genotyping, gene-expression measurement, and detection of allele-specific expression with oligonucleotide arrays. , 2005, Genome research.

[23]  R W Doerge,et al.  Accounting for Variability in the Use of Permutation Testing to Detect Quantitative Trait Loci , 2000, Biometrics.

[24]  David P. Dobkin,et al.  The quickhull algorithm for convex hulls , 1996, TOMS.

[25]  B. Munneke Null model methods for cluster analysis of gene expression data , 2001 .

[26]  B. Yandell,et al.  Dimension reduction for mapping mRNA abundance as quantitative traits. , 2003, Genetics.

[27]  M K Kerr,et al.  Bootstrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[29]  C. Müller,et al.  Large-scale clustering of cDNA-fingerprinting data. , 1999, Genome research.

[30]  R. Doerge Multifactorial genetics: Mapping and analysis of quantitative trait loci in experimental populations , 2002, Nature Reviews Genetics.

[31]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.