Selection and validation of normalization methods for c-DNA microarrays using within-array replications

MOTIVATION Normalization of microarray data is essential for multiple-array analyses. Several normalization protocols have been proposed based on different biological or statistical assumptions. A fundamental problem arises whether they have effectively normalized arrays. In addition, for a given array, the question arises how to choose a method to most effectively normalize the microarray data. RESULTS We propose several techniques to compare the effectiveness of different normalization methods. We approach the problem by constructing statistics to test whether there are any systematic biases in the expression profiles among duplicated spots within an array. The test statistics involve estimating the genewise variances. This is accomplished by using several novel methods, including empirical Bayes methods for moderating the genewise variances and the smoothing methods for aggregating variance information. P-values are estimated based on a normal or chi approximation. With estimated P-values, we can choose a most appropriate method to normalize a specific array and assess the extent to which the systematic biases due to the variations of experimental conditions have been removed. The effectiveness and validity of the proposed methods are convincingly illustrated by a carefully designed simulation study. The method is further illustrated by an application to human placenta cDNAs comprising a large number of clones with replications, a customized microarray experiment carrying just a few hundred genes on the study of the molecular roles of Interferons on tumor, and the Agilent microarrays carrying tens of thousands of total RNA samples in the MAQC project on the study of reproducibility, sensitivity and specificity of the data. AVAILABILITY Code to implement the method in the statistical package R is available from the authors.

[1]  Jian Huang,et al.  A robust two-way semi-linear model for normalization of cDNA microarray data , 2005, BMC Bioinformatics.

[2]  J. Shaffer Multiple Hypothesis Testing , 1995 .

[3]  Jian Huang,et al.  A Two-Way Semilinear Model for Normalization and Analysis of cDNA Microarray Data , 2005 .

[4]  Jianqing Fan,et al.  Statistical Analysis of DNA Microarray Data in Cancer Research , 2006, Clinical Cancer Research.

[5]  X. Cui,et al.  Improved statistical tests for differential gene expression by shrinking variance components estimates. , 2005, Biostatistics.

[6]  Andreas D. Baxevanis,et al.  The Molecular Biology Database Collection: 2002 update , 2002, Nucleic Acids Res..

[7]  S. Wölfl,et al.  Ranking: a closer look on globalisation methods for normalisation of gene expression arrays. , 2002, Nucleic acids research.

[8]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[9]  Jian Huang,et al.  Robust semiparametric microarray normalization and significance analysis. , 2006, Biometrics.

[10]  Yoav Benjamini,et al.  Identifying differentially expressed genes using false discovery rate controlling procedures , 2003, Bioinform..

[11]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[12]  Jian Huang,et al.  A Two-Way Semi-Linear Model for Normalization and Significant Analysis of cDNA Microarray Data , 2005 .

[13]  P. Tam,et al.  Normalization and analysis of cDNA microarrays using within-array replications applied to neuroblastoma cell response to a cytokine. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[14]  M. Oh,et al.  Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. , 2001, Nucleic acids research.

[15]  Eliot Marshall,et al.  Getting the Noise Out of Gene Arrays , 2004, Science.

[16]  Michael Eisenstein,et al.  Microarrays: Quality control , 2006, Nature.

[17]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Jianqing Fan,et al.  Semilinear High-Dimensional Model for Normalization of Microarray Data , 2005 .

[19]  W. E. Feinberg Teaching the Type I and Type II Errors: The Judicial Process , 1971 .

[20]  P. Collins,et al.  Performance comparison of one-color and two-color platforms within the Microarray Quality Control (MAQC) project , 2006, Nature Biotechnology.

[21]  Rainer Spang,et al.  Selecting normalization genes for small diagnostic microarrays , 2006, BMC Bioinformatics.

[22]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[23]  R. Pearl Biometrics , 1914, The American Naturalist.

[24]  J. Cavanaugh Biostatistics , 2005, Definitions.

[25]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[26]  John D. Storey,et al.  A new approach to intensity-dependent normalization of two-channel microarrays. , 2007, Biostatistics.

[27]  Gordon K. Smyth,et al.  Use of within-array replicate spots for assessing differential expression in microarray experiments , 2005, Bioinform..

[28]  J. Gurland,et al.  A Simple Approximation for Unbiased Estimation of the Standard Deviation , 1971 .

[29]  John D. Storey,et al.  Normalization of two-channel microarrays accounting for experimental design and intensity-dependent relationships , 2007, Genome Biology.