Pre-processing Agilent microarray data

BackgroundPre-processing methods for two-sample long oligonucleotide arrays, specifically the Agilent technology, have not been extensively studied. The goal of this study is to quantify some of the sources of error that affect measurement of expression using Agilent arrays and to compare Agilent's Feature Extraction software with pre-processing methods that have become the standard for normalization of cDNA arrays. These include log transformation followed by loess normalization with or without background subtraction and often a between array scale normalization procedure. The larger goal is to define best study design and pre-processing practices for Agilent arrays, and we offer some suggestions.ResultsSimple loess normalization without background subtraction produced the lowest variability. However, without background subtraction, fold changes were biased towards zero, particularly at low intensities. ROC analysis of a spike-in experiment showed that differentially expressed genes are most reliably detected when background is not subtracted. Loess normalization and no background subtraction yielded an AUC of 99.7% compared with 88.8% for Agilent processed fold changes. All methods performed well when error was taken into account by t- or z-statistics, AUCs ≥ 99.8%. A substantial proportion of genes showed dye effects, 43% (99%CI : 39%, 47%). However, these effects were generally small regardless of the pre-processing method.ConclusionSimple loess normalization without background subtraction resulted in low variance fold changes that more reliably ranked gene expression than the other methods. While t-statistics and other measures that take variation into account, including Agilent's z-statistic, can also be used to reliably select differentially expressed genes, fold changes are a standard measure of differential expression for exploratory work, cross platform comparison, and biological interpretation and can not be entirely replaced. Although dye effects are small for most genes, many array features are affected. Therefore, an experimental design that incorporates dye swaps or a common reference could be valuable.

[1]  J. Davis Bioinformatics and Computational Biology Solutions Using R and Bioconductor , 2007 .

[2]  Yee Hwa Yang,et al.  Preprocessing Two-Color Spotted Arrays , 2005 .

[3]  Terry Speed,et al.  Normalization of cDNA microarray data. , 2003, Methods.

[4]  Terence P. Speed,et al.  A benchmark for Affymetrix GeneChip expression measures , 2004, Bioinform..

[5]  T. Speed,et al.  Statistical issues in cDNA microarray data analysis. , 2003, Methods in molecular biology.

[6]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[7]  Weida Tong,et al.  Evaluation of external RNA controls for the assessment of microarray performance , 2006, Nature Biotechnology.

[8]  Rafael A. Irizarry,et al.  Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005 .

[9]  L. Qin,et al.  Empirical evaluation of data transformations and ranking statistics for microarray analysis. , 2004, Nucleic acids research.

[10]  Ingrid Lönnstedt Replicated microarray data , 2001 .

[11]  Jean-Jacques Daudin,et al.  Evaluation of the gene-specific dye bias in cDNA microarray experiments , 2005, Bioinform..

[12]  John Quackenbush Microarray data normalization and transformation , 2002, Nature Genetics.

[13]  Kevin Dobbin,et al.  Statistical Design of Reverse Dye Microarrays , 2003, Bioinform..

[14]  Kathleen F. Kerr,et al.  Standardizing global gene expression analysis between laboratories and across platforms , 2005, Nature Methods.

[15]  D Hasenclever,et al.  Comparison of Preprocessing Procedures for Oligo-nucleotide Micro-arrays by Parametric Bootstrap Simulation of Spike-in Experiments , 2004, Methods of Information in Medicine.

[16]  John D. Storey,et al.  Lymphocyte Anergy in Patients with Carcinoma , 1973, British Journal of Cancer.

[17]  G. Church,et al.  Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset , 2005, Genome Biology.

[18]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[19]  A. Dombkowski,et al.  Gene‐specific dye bias in microarray reference designs , 2004, FEBS letters.

[20]  M. Oh,et al.  Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. , 2001, Nucleic acids research.

[21]  Giovanni Parmigiani,et al.  When Should One Substract Background Fluorescence in Two Color Microarrays , 2005 .

[22]  K. K. Dobbin,et al.  Characterizing dye bias in microarray experiments , 2005, Bioinform..

[23]  P. S. Pine,et al.  Dye bias correction in dual-labeled cDNA microarray gene expression measurements. , 2004, Environmental health perspectives.

[24]  Khanh Nguyen,et al.  Estimation of the confidence limits of oligonucleotide-array-based measurements of differential expression , 2001, SPIE BiOS.

[25]  Sandrine Dudoit,et al.  Bioconductor R Packages for Exploratory Analysis and Normalization of cDNA Microarray Data , 2003 .

[26]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[27]  Alicia Oshlack,et al.  Normalization of boutique two-color microarrays with a high proportion of differentially expressed probes , 2007, Genome Biology.

[28]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[29]  Giovanni Parmigiani,et al.  When should one subtract background fluorescence in 2-color microarrays? , 2006, Biostatistics.