'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns

BackgroundLarge gene expression studies, such as those conducted using DNA arrays, often provide millions of different pieces of data. To address the problem of analyzing such data, we describe a statistical method, which we have called 'gene shaving'. The method identifies subsets of genes with coherent expression patterns and large variation across conditions. Gene shaving differs from hierarchical clustering and other widely used methods for analyzing gene expression studies in that genes may belong to more than one cluster, and the clustering may be supervised by an outcome measure. The technique can be 'unsupervised', that is, the genes and samples are treated as unlabeled, or partially or fully supervised by using known properties of the genes or samples to assist in finding meaningful groupings.ResultsWe illustrate the use of the gene shaving method to analyze gene expression measurements made on samples from patients with diffuse large B-cell lymphoma. The method identifies a small cluster of genes whose expression is highly predictive of survival.ConclusionsThe gene shaving method is a potentially useful tool for exploration of gene expression data and identification of interesting clusters of genes worth further investigation.

[1]  G. S. Johnson,et al.  An Information-Intensive Approach to the Molecular Pharmacology of Cancer , 1997, Science.

[2]  M. Holcombe,et al.  Information Processing in Cells and Tissues , 1998, Springer US.

[3]  I. Jonassen,et al.  Predicting gene regulatory elements in silico on a genomic scale. , 1998, Genome research.

[4]  G. Church,et al.  Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation , 1998, Nature Biotechnology.

[5]  P. D’haeseleer,et al.  Mining the gene expression matrix: inferring gene relationships from large scale gene expression data , 1998 .

[6]  J. Claverie,et al.  Large-scale statistical analyses of rice ESTs reveal correlated patterns of gene expression. , 1999, Genome research.

[7]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[8]  R. Tibshirani,et al.  Clustering methods for the analysis of DNA microarray data , 1999 .

[9]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[10]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[11]  C. Niehrs,et al.  Synexpression groups in eukaryotes , 1999, Nature.

[12]  Eric Mjolsness,et al.  From Coexpression to Coregulation: An Approach to Inferring Transcriptional Regulation among Gene Classes from Large-Scale Expression Data , 1999, NIPS.

[13]  Christian A. Rees,et al.  Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[14]  E. Sprinzak,et al.  Prediction of gene function by genome-scale expression analysis: prostate cancer-associated genes. , 1999, Genome research.

[15]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Trevor Hastie,et al.  Gene Shaving: a new class of clustering methods for expression arrays , 2000 .

[17]  B. Berger,et al.  Human and Mouse Gene Structure: Comparative Analysis and Application to Exon Prediction , 2000 .

[18]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[19]  D.,et al.  Regression Models and Life-Tables , 2022 .