Comparison of Affymetrix data normalization methods using 6,926 experiments across five array generations

BackgroundGene expression microarray technologies are widely used across most areas of biological and medical research. Comparing and integrating microarray data from different experiments would be very useful, but is currently very challenging due to the experimental and hybridization conditions, as well as data preprocessing and normalization methods. Furthermore, even in the case of the widely-used, industry-standard Affymetrix oligonucleotide microarrays, the various array generations have different probe sets representing different genes, hindering the data integration.ResultsIn this study our objective is to find systematic approaches to normalize the data emerging from different Affymetrix array generations and from different laboratories. We compare and assess the accuracy of five normalization methods for Affymetrix gene expression data using 6,926 Affymetrix experiments from five array generations. The methods that we compare include 1) standardization, 2) housekeeping gene based normalization, 3) equalized quantile normalization, 4) Weibull distribution based normalization and 5) array generation based gene centering. Our results indicate that the best results are achieved when the data is normalized first within a sample and then between-samples with Array Generation based gene Centering (AGC) normalization.ConclusionWe conclude that with the AGC method integrating different Affymetrix datasets results in values that are significantly more comparable across the array generations than in the cases where no array generation based normalization is used. The AGC method was found to be the best method for normalizing the data from several different array generations, and achieve comparable gene values across thousands of samples.

[1]  Joel S. Parker,et al.  Adjustment of systematic microarray data biases , 2004, Bioinform..

[2]  Damian Smedley,et al.  Ensembl 2005 , 2004, Nucleic Acids Res..

[3]  Lance D. Miller,et al.  Correlation test to assess low-level processing of high-density oligonucleotide microarray data , 2005, BMC Bioinformatics.

[4]  J. Astola,et al.  The weibull distribution based normalization method for affymetrix gene expression microarray data , 2006, 2006 IEEE International Workshop on Genomic Signal Processing and Statistics.

[5]  Daphne Koller,et al.  Genome-wide discovery of transcriptional modules from DNA sequence and gene expression , 2003, ISMB.

[6]  Jaakko Astola,et al.  Effects of Herceptin treatment on global gene expression patterns in HER2-amplified and nonamplified breast cancer cell lines , 2004, Oncogene.

[7]  C. V. Jongeneel,et al.  eVOC: a controlled vocabulary for unifying gene expression data. , 2003, Genome research.

[8]  Eric P. Hoffman,et al.  Probe set algorithms: is there a rational best bet? , 2006, BMC Bioinformatics.

[9]  Atul J. Butte,et al.  Reproducibility of gene expression across generations of Affymetrix microarrays , 2003, BMC Bioinformatics.

[10]  G. Church,et al.  Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset , 2005, Genome Biology.

[11]  J. Astola,et al.  Systematic bioinformatic analysis of expression levels of 17,330 human genes across 9,783 samples from 175 types of healthy and pathological tissues , 2008, Genome Biology.

[12]  T. Barrette,et al.  ONCOMINE: a cancer microarray database and integrated data-mining platform. , 2004, Neoplasia.

[13]  T. Barrette,et al.  Mining for regulatory programs in the cancer transcriptome , 2005, Nature Genetics.

[14]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[15]  Stat Pairs,et al.  Statistical Algorithms Description Document , 2022 .

[16]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[17]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[18]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[19]  E. Levanon,et al.  Human housekeeping genes are compact. , 2003, Trends in genetics : TIG.

[20]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[21]  Kai Wang,et al.  Comparative analysis of microarray normalization procedures: effects on reverse engineering gene networks , 2007, ISMB/ECCB.

[22]  Wolfgang Härdle,et al.  Applied Multivariate Statistical Analysis: third edition , 2006 .

[23]  D. Koller,et al.  A module map showing conditional activity of expression modules in cancer , 2004, Nature Genetics.

[24]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[25]  J. Downing,et al.  Classification of pediatric acute lymphoblastic leukemia by gene expression profiling. , 2003, Blood.

[26]  Walter R. Gilks,et al.  Fusing microarray experiments with multivariate regression , 2005, ECCB/JBI.

[27]  C. Schlötterer,et al.  Comparison of algorithms for the analysis of Affymetrix microarray data as evaluated by co-expression of genes in known operons , 2006, Nucleic acids research.

[28]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[29]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[30]  Tero Aittokallio,et al.  Integrating probe-level expression changes across generations of Affymetrix arrays , 2005, Nucleic acids research.

[31]  S. Nelson,et al.  Celsius: a community resource for Affymetrix microarray data , 2007, Genome Biology.

[32]  Peter J. Park,et al.  Combining gene expression data from different generations of oligonucleotide arrays , 2004, BMC Bioinformatics.

[33]  Soumyaroop Bhattacharya,et al.  Transformation of expression intensities across generations of Affymetrix microarrays using sequence matching and regression modeling , 2005, Nucleic acids research.