Effects of threshold choice on biological conclusions reached during analysis of gene expression by DNA microarrays.

Global analysis of gene expression by using DNA microarrays is employed increasingly to search for differences in biological properties between normal and diseased tissue. In such studies, expression that deviates from defined thresholds commonly is used for creating genetic signatures that characterize disease vs. normality. Although it is axiomatic that the threshold parameters applied to microarray analysis will alter the contents of such genetic signatures, the extent to which threshold choice can affect the fundamental conclusions made from microarray-based studies has not been elucidated. We used GABRIEL (Genetic Analysis By Rules Incorporating Expert Logic), a platform of knowledge-based algorithms for the global analysis of gene expression, together with conventional statistical approaches, to examine the sensitivity of conclusions to threshold choice in recently published microarray-based studies. An analysis of the effects of threshold decisions in one of these studies [Ramaswamy, S., Ross, K. N., Lander, E. S. & Golub, T. R. (2003) Nat. Genet. 33, 49-54], which arrived at the important conclusion that the metastatic potential of primary tumors is encoded by the bulk of cells in the tumor, is the focus of this article. We discovered that support for this conclusion highly depends on the threshold used to create gene expression signatures. We also found that threshold choice dramatically affected the gene function categories represented nonrandomly in signatures. Our results suggest that the robustness of biological conclusions made by using microarray analysis should be routinely assessed by examining the validity of the conclusions by using a range of threshold parameters.

[1]  I. Fidler,et al.  The pathogenesis of cancer metastasis , 1980, Nature.

[2]  V. Devita,et al.  Cancer : Principles and Practice of Oncology , 1982 .

[3]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[4]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[5]  G. Sherlock Analysis of large-scale gene expression data. , 2000, Current opinion in immunology.

[6]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[7]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[8]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[9]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[10]  John Quackenbush,et al.  Computational genetics: Computational analysis of microarray data , 2001, Nature Reviews Genetics.

[11]  Ben Taskar,et al.  Rich probabilistic models for gene expression , 2001, ISMB.

[12]  S. Cohen,et al.  Global analysis of growth phase responsive gene expression and regulation of antibiotic biosynthetic pathways in Streptomyces coelicolor using DNA microarrays. , 2001, Genes & development.

[13]  E. Shortliffe,et al.  Biomedical informatics : computer applications in health care and biomedicine , 2001 .

[14]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[15]  Wei Pan,et al.  A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments , 2002, Bioinform..

[16]  P. Tang,et al.  Medical Computer Applications in Health Care and Biomedicine , 2002 .

[17]  Russ B. Altman,et al.  Nonparametric methods for identifying differentially expressed genes in microarray data , 2002, Bioinform..

[18]  Chih-Jian Lih,et al.  Analysis of DNA microarrays using algorithms that employ rule-based expert knowledge , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[19]  X. Wang,et al.  Predicting hepatitis B virus–positive metastatic hepatocellular carcinomas using gene expression profiling and supervised machine learning , 2003, Nature Medicine.

[20]  C. Cordon-Cardo,et al.  A multigenic program mediating breast cancer metastasis to bone. , 2003, Cancer cell.

[21]  E. Lander,et al.  A molecular signature of metastasis in primary solid tumors , 2003, Nature Genetics.

[22]  Soheil Shams,et al.  Noise Sampling Method: An ANOVA Approach Allowing Robust Selection of Differentially Regulated Genes Measured by DNA Microarrays , 2003, Bioinform..

[23]  Stanley N Cohen,et al.  Senescence-specific gene expression fingerprints reveal cell-type-dependent physical clustering of up-regulated chromosomal loci , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Wei Pan,et al.  On the Use of Permutation in and the Performance of A Class of Nonparametric Methods to Detect Differential Gene Expression , 2003, Bioinform..

[25]  Stanley N Cohen,et al.  Recruitment of terminal protein to the ends of Streptomyces linear plasmids and chromosomes by a novel telomere-binding protein essential for linear DNA replication. , 2003, Genes & development.

[26]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Stanley N Cohen,et al.  Disparate effects of telomere attrition on gene expression during replicative senescence of human mammary epithelial cells cultured under different conditions , 2004, Oncogene.

[28]  R. Tibshirani,et al.  Gene expression profiling identifies clinically relevant subtypes of prostate cancer. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[29]  David Botstein,et al.  GO: : TermFinder--open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes , 2004, Bioinform..

[30]  E. Liu Mechanism-derived gene expression signatures and predictive biomarkers in clinical oncology. , 2005, Proceedings of the National Academy of Sciences of the United States of America.