Ranking differentially expressed genes from Affymetrix gene expression data: methods with reproducibility, sensitivity, and specificity

BackgroundTo identify differentially expressed genes (DEGs) from microarray data, users of the Affymetrix GeneChip system need to select both a preprocessing algorithm to obtain expression-level measurements and a way of ranking genes to obtain the most plausible candidates. We recently recommended suitable combinations of a preprocessing algorithm and gene ranking method that can be used to identify DEGs with a higher level of sensitivity and specificity. However, in addition to these recommendations, researchers also want to know which combinations enhance reproducibility.ResultsWe compared eight conventional methods for ranking genes: weighted average difference (WAD), average difference (AD), fold change (FC), rank products (RP), moderated t statistic (modT), significance analysis of microarrays (samT), shrinkage t statistic (shrinkT), and intensity-based moderated t statistic (ibmT) with six preprocessing algorithms (PLIER, VSN, FARMS, multi-mgMOS (mmgMOS), MBEI, and GCRMA). A total of 36 real experimental datasets was evaluated on the basis of the area under the receiver operating characteristic curve (AUC) as a measure for both sensitivity and specificity. We found that the RP method performed well for VSN-, FARMS-, MBEI-, and GCRMA-preprocessed data, and the WAD method performed well for mmgMOS-preprocessed data. Our analysis of the MicroArray Quality Control (MAQC) project's datasets showed that the FC-based gene ranking methods (WAD, AD, FC, and RP) had a higher level of reproducibility: The percentages of overlapping genes (POGs) across different sites for the FC-based methods were higher overall than those for the t-statistic-based methods (modT, samT, shrinkT, and ibmT). In particular, POG values for WAD were the highest overall among the FC-based methods irrespective of the choice of preprocessing algorithm.ConclusionOur results demonstrate that to increase sensitivity, specificity, and reproducibility in microarray analyses, we need to select suitable combinations of preprocessing algorithms and gene ranking methods. We recommend the use of FC-based methods, in particular RP or WAD.

[1]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[2]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[3]  C. Li,et al.  Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[5]  Wei-Min Liu,et al.  Robust estimators for expression analysis , 2002, Bioinform..

[6]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[7]  Rainer Breitling,et al.  Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments , 2004, FEBS letters.

[8]  Benjamin M. Bolstad,et al.  affy - analysis of Affymetrix GeneChip data at the probe level , 2004, Bioinform..

[9]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .

[10]  G. Church,et al.  Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset , 2005, Genome Biology.

[11]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[12]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[13]  Neil D. Lawrence,et al.  A tractable probabilistic model for Affymetrix probe-level analysis across multiple chips , 2005, Bioinform..

[14]  Sébastien Lemieux,et al.  Probe-level linear model fitting and mixture modeling results in high accuracy detection of differential gene expression , 2006, BMC Bioinformatics.

[15]  Hanlee P. Ji,et al.  The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. , 2006, Nature biotechnology.

[16]  Rainer Breitling,et al.  RankProd: a bioconductor package for detecting differentially expressed genes in meta-analysis , 2006, Bioinform..

[17]  Klaus Obermayer,et al.  A new summarization method for affymetrix probe level data , 2006, Bioinform..

[18]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[19]  Neil D. Lawrence,et al.  Probe-level measurement error improves accuracy in detecting differential gene expression , 2006, Bioinform..

[20]  Rafael A. Irizarry,et al.  Comparison of Affymetrix GeneChip expression measures , 2006, Bioinform..

[21]  Mario Medvedovic,et al.  Intensity-based hierarchical Bayes method improves testing for differentially expressed genes in microarray experiments , 2006, BMC Bioinformatics.

[22]  Koji Kadota,et al.  GOGOT: a method for the identification of differentially expressed fragments from cDNA-AFLP data , 2007, Algorithms for Molecular Biology.

[23]  Qingzhong Liu,et al.  A distribution free summarization method for Affymetrix GeneChip arrays. , 2007, Bioinformatics.

[24]  Koji Kadota,et al.  A weighted average difference method for detecting differentially expressed genes from microarray data , 2008, Algorithms for Molecular Biology.

[25]  Chao Cheng,et al.  A probe-treatment-reference (PTR) model for the analysis of oligonucleotide expression microarrays , 2007, BMC Bioinformatics.

[26]  Richard D. Pearson,et al.  A comprehensive re-analysis of the Golden Spike data: Towards a benchmark for differential expression methods , 2008, BMC Bioinformatics.

[27]  Dennis B. Troup,et al.  NCBI GEO: mining tens of millions of expression profiles—database and tools update , 2006, Nucleic Acids Res..

[28]  Korbinian Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology , 2005 .

[29]  Koji Kadota,et al.  Up-Regulation of Genes Related to the Ubiquitin-Proteasome System in the Brown Adipose Tissue of 24-h-Fasted Rats , 2008, Bioscience, biotechnology, and biochemistry.

[30]  Stephan Preibisch,et al.  "Hook"-calibration of GeneChip-microarrays: Chip characteristics and expression measures , 2008, Algorithms for Molecular Biology.

[31]  Catalin C. Barbacioru,et al.  The balance of reproducibility, sensitivity, and specificity of lists of differentially expressed genes in microarray studies , 2008, BMC Bioinformatics.

[32]  Jin Xu,et al.  Robustified MANOVA with applications in detecting differentially expressed genes from oligonucleotide arrays , 2008, Bioinform..

[33]  Stephan Preibisch,et al.  "Hook"-calibration of GeneChip-microarrays: Theory and algorithm , 2008, Algorithms for Molecular Biology.