Establishing a major cause of discrepancy in the calibration of Affymetrix GeneChips

BackgroundAffymetrix GeneChips are a popular platform for performing whole-genome experiments on the transcriptome. There are a range of different calibration steps, and users are presented with choices of different background subtractions, normalisations and expression measures. We wished to establish which of the calibration steps resulted in the biggest uncertainty in the sets of genes reported to be differentially expressed.ResultsOur results indicate that the sets of genes identified as being most significantly differentially expressed, as estimated by the z-score of fold change, is relatively insensitive to the choice of background subtraction and normalisation. However, the contents of the gene list are most sensitive to the choice of expression measure. This is irrespective of whether the experiment uses a rat, mouse or human chip and whether the chip definition is made using probe mappings from Unigene, RefSeq, Entrez Gene or the original Affymetrix definitions. It is also irrespective of whether both Present and Absent, or just Present, Calls from the MAS5 algorithm are used to filter genelists, and this conclusion holds for genes of differing intensities. We also reach the same conclusion after assigning genes to be differentially expressed using t-statistics, although this approach results in a large amount of false positives in the sets of genes identified due to the small numbers of replicates typically used in microarray experiments.ConclusionThe major calibration uncertainty that biologists need to consider when analysing Affymetrix data is how their multiple probe values are condensed into one expression measure.

[1]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[2]  Lance D. Miller,et al.  Correlation test to assess low-level processing of high-density oligonucleotide microarray data , 2005, BMC Bioinformatics.

[3]  Rafael A. Irizarry,et al.  Comparison of Affymetrix GeneChip expression measures , 2006, Bioinform..

[4]  John Quackenbush Microarray data normalization and transformation , 2002, Nature Genetics.

[5]  Felix Naef,et al.  Absolute mRNA concentrations from sequence-specific calibration of oligonucleotide arrays. , 2003, Nucleic acids research.

[6]  E. Chudin,et al.  Assessment of the relationship between signal intensities and transcript concentration for Affymetrix GeneChip® arrays , 2001, Genome Biology.

[7]  C. Li,et al.  Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[8]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  K. Aldape,et al.  A model of molecular interactions on short oligonucleotide microarrays , 2003, Nature Biotechnology.

[10]  Nir Friedman,et al.  Comparative analysis of algorithms for signal quantitation from oligonucleotide microarrays , 2004, Bioinform..

[11]  R. Myers,et al.  Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data , 2005, Nucleic acids research.

[12]  Stephen C. Harris,et al.  Rat toxicogenomic study reveals analytical consistency across microarray platforms , 2006, Nature Biotechnology.

[13]  Dennis B. Troup,et al.  NCBI GEO: mining millions of expression profiles—database and tools , 2004, Nucleic Acids Res..

[14]  T. Westhoff,et al.  A physiogenomic approach to study the regulation of blood pressure. , 2005, Physiological genomics.

[15]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[16]  J. Davis Bioinformatics and Computational Biology Solutions Using R and Bioconductor , 2007 .

[17]  Jinrui Shi,et al.  Embryo-specific silencing of a transporter reduces phytic acid content of maize and soybean seeds , 2007, Nature Biotechnology.

[18]  S. Merhar,et al.  Letter to the editor , 2005, IEEE Communications Magazine.

[19]  M. Dugas,et al.  Profound effect of normalization on detection of differentially expressed genes in oligonucleotide microarray data analysis , 2002, Genome Biology.

[20]  L. Kunkel,et al.  Gene expression comparison of biopsies from Duchenne muscular dystrophy (DMD) and normal skeletal muscle , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Felix Naef,et al.  Solving the riddle of the bright mismatches: labeling and effective binding in oligonucleotide arrays. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Maria A Stalteri,et al.  Give me shelter: the global housing crisis. , 2003, BMC Bioinformatics.

[23]  D. Grigoryev,et al.  Chronic intermittent hypoxia upregulates genes of lipid biosynthesis in obese mice. , 2005, Journal of applied physiology.

[24]  Yi-Ching Hsieh,et al.  In chronic myeloid leukemia white cells from cytogenetic responders and non-responders to imatinib have very similar gene expression signatures. , 2005, Haematologica.

[25]  Eric P. Hoffman,et al.  Probe set algorithms: is there a rational best bet? , 2006, BMC Bioinformatics.

[26]  Andrew Harrison,et al.  Comparisons of Annotation Predictions for Affymetrix GeneChips® , 2006, Applied bioinformatics.

[27]  S. Knudsen,et al.  A new non-linear normalization method for reducing variability in DNA microarray experiments , 2002, Genome Biology.

[28]  Xing Qiu,et al.  Statistical methods and microarray data , 2007, Nature Biotechnology.

[29]  Francisco Martinez-Murillo,et al.  Nonsense surveillance regulates expression of diverse classes of mammalian transcripts and mutes genomic noise , 2004, Nature Genetics.

[30]  Tao Han,et al.  Microarray scanner calibration curves: characteristics and implications , 2005, BMC Bioinformatics.