Group testing for pathway analysis improves comparability of different microarray datasets

MOTIVATION The wide use of DNA microarrays for the investigation of the cell transcriptome triggered the invention of numerous methods for the processing of microarray data and lead to a growing number of microarray studies that examine the same biological conditions. However, comparisons made on the level of gene lists obtained by different statistical methods or from different datasets hardly converge. We aimed at examining such discrepancies on the level of apparently affected biologically related groups of genes, e.g. metabolic or signalling pathways. This can be achieved by group testing procedures, e.g. over-representation analysis, functional class scoring (FCS), or global tests. RESULTS Three public prostate cancer datasets obtained with the same microarray platform (HGU95A/HGU95Av2) were analyzed. Each dataset was subjected to normalization by either variance stabilizing normalization (vsn) or mixed model normalization (MMN). Then, statistical analysis of microarrays was applied to the vsn-normalized data and mixed model analysis to the data normalized by MMN. For multiple testing adjustment the false discovery rate was calculated and the threshold was set to 0.05. Gene lists from the same method applied to different datasets showed overlaps between 42 and 52%, while lists from different methods applied to the same dataset had between 63 and 85% of genes in common. A number of six gene lists obtained by the two statistical methods applied to the three datasets was then subjected to group testing by Fisher's exact test. Group testing by GSEA and global test was applied to the three datasets, as well. Fisher's exact test followed by global test showed more consistent results with respect to the concordance between analyses on gene lists obtained by different methods and different datasets than the GSEA. However, all group testing methods identified pathways that had already been described to be involved in the pathogenesis of prostate cancer. Moreover, pathways recurrently identified in these analyses are more likely to be reliable than those from a single analysis on a single dataset.

[1]  G. Gibson,et al.  Mixed-model reanalysis of primate data suggests tissue and species biases in oligonucleotide-based gene expression profiles. , 2003, Genetics.

[2]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[3]  J. Giltnane,et al.  Technology Insight: identification of biomarkers with tissue microarray technology , 2004, Nature Clinical Practice Oncology.

[4]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[5]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[7]  J. Welsh,et al.  Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer. , 2001, Cancer research.

[8]  T. Golub,et al.  A Mechanism of Cyclin D1 Action Encoded in the Patterns of Gene Expression in Human Cancer , 2003, Cell.

[9]  M. Orešič,et al.  Pathways to the analysis of microarray data. , 2005, Trends in biotechnology.

[10]  J. Mesirov,et al.  An oncogenic KRAS2 expression signature identified by cross-species gene-expression analysis , 2005, Nature Genetics.

[11]  P. Khatri,et al.  Global functional profiling of gene expression. , 2003, Genomics.

[12]  A. Poustka,et al.  Parameter estimation for the calibration and variance stabilization of microarray data , 2003, Statistical applications in genetics and molecular biology.

[13]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[14]  D. Allison,et al.  Microarray data analysis: from disarray to consolidation and consensus , 2006, Nature Reviews Genetics.

[15]  P. W. Hochachka,et al.  Going malignant: the hypoxia-cancer connection in the prostate. , 2002, BioEssays : news and reviews in molecular, cellular and developmental biology.

[16]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[17]  Eytan Domany,et al.  Outcome signature genes in breast cancer: is there a unique set? , 2004, Breast Cancer Research.

[18]  J. Hescheler,et al.  Redox-regulation of intrinsic prion expression in multicellular prostate tumor spheroids. , 1999, Free radical biology & medicine.

[19]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..

[20]  Purvesh Khatri,et al.  A comparison of existing tools for ontological analysis of gene expression data , 2005 .

[21]  B. Weir,et al.  A systematic statistical linear modeling approach to oligonucleotide array experiments. , 2002, Mathematical biosciences.

[22]  V. Arango,et al.  Using the Gene Ontology for Microarray Data Mining: A Comparison of Methods and Application to Age Effects in Human Prefrontal Cortex , 2004, Neurochemical Research.

[23]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[24]  R. Moreno-Sánchez,et al.  Sulfur assimilation and glutathione metabolism under cadmium stress in yeast, protists and plants. , 2005, FEMS microbiology reviews.

[25]  A. D. De Marzo,et al.  GSTP1 CpG island hypermethylation as a molecular biomarker for prostate cancer , 2004, Journal of cellular biochemistry.

[26]  Matthias Kretzler,et al.  Decrease and gain of gene expression are equally discriminatory markers for prostate carcinoma: a gene expression analysis on total and microdissected prostate tissue. , 2002, The American journal of pathology.

[27]  Jelle J. Goeman,et al.  A global test for groups of genes: testing association with a clinical outcome , 2004, Bioinform..

[28]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[29]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[30]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[31]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[32]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .