Integrating Affymetrix microarray data sets using probe-level test statistic for predicting prostate cancer

Microarray technology has previously been used to identify differentially expressed genes between tumor and normal prostate samples in a single study as well as in a synthesis involving multiple studies. When integrating results from several Affymetrix microarray datasets, previous studies have used probeset-level data which may lead to a loss of information contained at the probe-level. Here, we propose a new approach for combining results across studies, based on a probe-level test statistic. Each probe-level test statistic is transformed into an effect size measure for each probeset and a random-effects model (REM) is used to integrate effect sizes across studies. We compared statistical and biological significance of the prognostic gene expression signatures identified in the probe-level model (PLM) with those in the probeset-level model (PSLM). Support vector machines (SVMs)-based predictive models were built using these two sets of signatures and their performances were evaluated using independent test datasets. Our analyses show that the prognostic gene expression signatures identified through the probe-level test statistics are more strongly differentially expressed and have better prediction accuracy than signatures derived from a probeset-level model

[1]  Michael Q. Zhang,et al.  Profiling alternatively spliced mRNA isoforms for prostate cancer classification , 2006, BMC Bioinformatics.

[2]  Joseph Beyene,et al.  Statistical Methods for Meta-Analysis of Microarray Data: A Comparative Study , 2006, Inf. Syst. Frontiers.

[3]  David S. Wishart,et al.  Applications of Machine Learning in Cancer Prediction and Prognosis , 2006, Cancer informatics.

[4]  Tero Aittokallio,et al.  Integrating probe-level expression changes across generations of Affymetrix arrays , 2005, Nucleic acids research.

[5]  Daniel Q. Naiman,et al.  Simple decision rules for classifying human cancers from gene expression profiles , 2005, Bioinform..

[6]  Daniel Q. Naiman,et al.  Robust prostate cancer marker genes emerge from direct integration of inter-study microarray data , 2005, Bioinform..

[7]  Joseph Beyene,et al.  Integrative analysis of multiple gene expression profiles with quality-adjusted effect size models , 2005, BMC Bioinformatics.

[8]  F. Marshall,et al.  Loss of HOXC6 expression induces apoptosis in prostate cancer cells , 2005, Oncogene.

[9]  R. Tibshirani,et al.  Gene expression profiling identifies clinically relevant subtypes of prostate cancer. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[10]  J. Wang-Rodriguez,et al.  In silico dissection of cell-type-associated patterns of gene expression in prostate cancer. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Sangsoo Kim,et al.  Combining multiple microarray studies and modeling interstudy variation , 2003, ISMB.

[12]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[13]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[14]  E. Latulippe,et al.  Comprehensive gene expression analysis of prostate cancer reveals distinct transcriptional programs associated with metastatic disease. , 2002, Cancer research.

[15]  T. Barrette,et al.  Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. , 2002, Cancer research.

[16]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[17]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[18]  S. Dhanasekaran,et al.  Delineation of prognostic biomarkers in prostate cancer , 2001, Nature.

[19]  J. Welsh,et al.  Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer. , 2001, Cancer research.

[20]  Jeffrey A. Magee,et al.  Expression profiling reveals hepsin overexpression in prostate cancer. , 2001, Cancer research.

[21]  M. Bittner,et al.  Human prostate cancer and benign prostatic hyperplasia: molecular dissection by gene expression profiling. , 2001, Cancer research.

[22]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[23]  I. Pastan,et al.  High Expression of a Specific T-Cell Receptor γ Transcript in Epithelial Cells of the Prostate , 1999 .

[24]  L. Hedges,et al.  Statistical Methods for Meta-Analysis , 1987 .

[25]  Ben Bolstad,et al.  Low-level Analysis of High-density Oligonucleotide Array Data: Background, Normalization and Summarization , 2003 .

[26]  I. Pastan,et al.  High expression of a specific T-cell receptor gamma transcript in epithelial cells of the prostate. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[28]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .