Reproducibility-Optimized Test Statistic for Ranking Genes in Microarray Studies

A principal goal of microarray studies is to identify the genes showing differential expression under distinct conditions. In such studies, the selection of an optimal test statistic is a crucial challenge, which depends on the type and amount of data under analysis. Although previous studies on simulated or spike-in data sets do not provide practical guidance on how to choose the best method for a given real data set, we introduce an enhanced reproducibility-optimization procedure, which enables the selection of a suitable gene-ranking statistic directly from the data. In comparison with existing ranking methods, the reproducibility-optimized statistic shows good performance consistently under various simulated conditions and on Affymetrix spike-in data set. Further, the feasibility of the novel statistic is confirmed in a practical research setting using data from an in-house cDNA microarray study of asthma-related gene expression changes. These results suggest that the procedure facilitates the selection of an appropriate test statistic for a given data set without relying on a priori assumptions, which may bias the findings and their interpretation. Moreover, the general reproducibility-optimization procedure is not limited to detecting differential expression only but could be extended to a wide range of other applications as well.

[1]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[2]  William Stafford Noble,et al.  The effect of replication on gene expression microarray experiments , 2003, Bioinform..

[3]  L. Qin,et al.  Empirical evaluation of data transformations and ranking statistics for microarray analysis. , 2004, Nucleic acids research.

[4]  Ingrid Lönnstedt Replicated microarray data , 2001 .

[5]  C. Genest,et al.  On blest's measure of rank correlation , 2003 .

[6]  Anne West,et al.  Computational Strategies for Analyzing Data in Gene Expression Microarray Experiments , 2003, J. Bioinform. Comput. Biol..

[7]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[8]  H GolubGene,et al.  Missing value estimation for DNA microarray gene expression data , 2005 .

[9]  Gene H. Golub,et al.  Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..

[10]  G. Garcı́a-Cardeña,et al.  Improving the statistical detection of regulated genes from microarray data using intensity-based variance estimation , 2004, BMC Genomics.

[11]  Stephen J. Roberts,et al.  Data-adaptive test statistics for microarray data , 2005, ECCB/JBI.

[12]  D. Allison,et al.  Towards sound epistemological foundations of statistical methods for high-dimensional biology , 2004, Nature Genetics.

[13]  Peter J Park,et al.  Improving identification of differentially expressed genes in microarray studies using information from public databases , 2004, Genome Biology.

[14]  A. West,et al.  A human ImmunoChip cDNA microarray provides a comprehensive tool to study immune responses. , 2005, Journal of immunological methods.

[15]  Per Broberg,et al.  A comparative review of estimates of the proportion unchanged genes and the false discovery rate , 2005, BMC Bioinformatics.

[16]  Rafael A. Irizarry,et al.  Comparison of Affymetrix GeneChip expression measures , 2006, Bioinform..

[17]  D. Allison,et al.  Microarray data analysis: from disarray to consolidation and consensus , 2006, Nature Reviews Genetics.

[18]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[19]  Terence P. Speed,et al.  A benchmark for Affymetrix GeneChip expression measures , 2004, Bioinform..

[20]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[21]  M. Schummer,et al.  Selecting Differentially Expressed Genes from Microarray Experiments , 2003, Biometrics.

[22]  Xiaochun Li,et al.  A Comparison of Parametric Versus Permutation Methods with Applications to General and Temporal Microarray Gene Expression Data , 2003, Bioinform..

[23]  P. Broberg Statistical methods for ranking differentially expressed genes , 2003, Genome Biology.

[24]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.