Comparison of seven methods for producing Affymetrix expression scores based on False Discovery Rates in disease profiling data

BackgroundA critical step in processing oligonucleotide microarray data is combining the information in multiple probes to produce a single number that best captures the expression level of a RNA transcript. Several systematic studies comparing multiple methods for array processing have used tightly controlled calibration data sets as the basis for comparison. Here we compare performances for seven processing methods using two data sets originally collected for disease profiling studies. An emphasis is placed on understanding sensitivity for detecting differentially expressed genes in terms of two key statistical determinants: test statistic variability for non-differentially expressed genes, and test statistic size for truly differentially expressed genes.ResultsIn the two data sets considered here, up to seven-fold variation across the processing methods was found in the number of genes detected at a given false discovery rate (FDR). The best performing methods called up to 90% of the same genes differentially expressed, had less variable test statistics under randomization, and had a greater number of large test statistics in the experimental data. Poor performance of one method was directly tied to a tendency to produce highly variable test statistic values under randomization. Based on an overall measure of performance, two of the seven methods (Dchip and a trimmed mean approach) are superior in the two data sets considered here. Two other methods (MAS5 and GCRMA-EB) are inferior, while results for the other three methods are mixed.ConclusionsChoice of processing method has a major impact on differential expression analysis of microarray data. Previously reported performance analyses using tightly controlled calibration data sets are not highly consistent with results reported here using data from human tissue samples. Performance of array processing methods in disease profiling and other realistic biological studies should be given greater consideration when comparing Affymetrix processing methods.

[1]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[2]  K. Aldape,et al.  A model of molecular interactions on short oligonucleotide microarrays , 2003, Nature Biotechnology.

[3]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .

[4]  Dilip Rajagopalan,et al.  A comparison of statistical methods for analysis of high density oligonucleotide array data , 2003, Bioinform..

[5]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[6]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[7]  G Rennert,et al.  Organ-specific molecular classification of primary lung, colon, and ovarian adenocarcinomas using gene expression profiles. , 2001, The American journal of pathology.

[8]  Terence P. Speed,et al.  A benchmark for Affymetrix GeneChip expression measures , 2004, Bioinform..

[9]  David E. Misek,et al.  Distinctive molecular profiles of high-grade and low-grade gliomas based on oligonucleotide microarray analysis. , 2001, Cancer research.

[10]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[11]  John D. Storey A direct approach to false discovery rates , 2002 .

[12]  Wei-Min Liu,et al.  Robust estimators for expression analysis , 2002, Bioinform..

[13]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[14]  M. Dugas,et al.  Profound effect of normalization on detection of differentially expressed genes in oligonucleotide microarray data analysis , 2002, Genome Biology.

[15]  C. Li,et al.  Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. , 2001, Proceedings of the National Academy of Sciences of the United States of America.