Comparison of gene expression microarray data with count-based RNA measurements informs microarray interpretation

BackgroundAlthough numerous investigations have compared gene expression microarray platforms, preprocessing methods and batch correction algorithms using constructed spike-in or dilution datasets, there remains a paucity of studies examining the properties of microarray data using diverse biological samples. Most microarray experiments seek to identify subtle differences between samples with variable background noise, a scenario poorly represented by constructed datasets. Thus, microarray users lack important information regarding the complexities introduced in real-world experimental settings. The recent development of a multiplexed, digital technology for nucleic acid measurement enables counting of individual RNA molecules without amplification and, for the first time, permits such a study.ResultsUsing a set of human leukocyte subset RNA samples, we compared previously acquired microarray expression values with RNA molecule counts determined by the nCounter Analysis System (NanoString Technologies) in selected genes. We found that gene measurements across samples correlated well between the two platforms, particularly for high-variance genes, while genes deemed unexpressed by the nCounter generally had both low expression and low variance on the microarray. Confirming previous findings from spike-in and dilution datasets, this “gold-standard” comparison demonstrated signal compression that varied dramatically by expression level and, to a lesser extent, by dataset. Most importantly, examination of three different cell types revealed that noise levels differed across tissues.ConclusionsMicroarray measurements generally correlate with relative RNA molecule counts within optimal ranges but suffer from expression-dependent accuracy bias and precision that varies across datasets. We urge microarray users to consider expression-level effects in signal interpretation and to evaluate noise properties in each dataset independently.

[1]  L. Waldron,et al.  mRNA transcript quantification in archival samples using multiplexed, color-coded probes , 2011, BMC biotechnology.

[2]  M. Dugas,et al.  Profound effect of normalization on detection of differentially expressed genes in oligonucleotide microarray data analysis , 2002, Genome Biology.

[3]  Leming Shi,et al.  Using RNA sample titrations to assess microarray platform performance and normalization techniques , 2006, Nature Biotechnology.

[4]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[5]  Chunyu Liu,et al.  Removing Batch Effects in Analysis of Expression Microarray Data: An Evaluation of Six Batch Adjustment Methods , 2011, PloS one.

[6]  Alvis Brazma,et al.  A CD8 T cell transcription signature predicts prognosis in autoimmune disease , 2010, Nature Medicine.

[7]  Ann M. Hess,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Filtering for increased power for microarray data analysis , 2008 .

[8]  Miles Parkes,et al.  Gene expression profiling of CD8+ T cells predicts prognosis in patients with Crohn disease and ulcerative colitis. , 2011, The Journal of clinical investigation.

[9]  Paul C Boutros,et al.  Systematic evaluation of medium-throughput mRNA abundance platforms. , 2013, RNA.

[10]  Bryan Frank,et al.  Independence and reproducibility across microarray platforms , 2005, Nature Methods.

[11]  Xiaohua Hu,et al.  Methods for evaluating gene expression from Affymetrix microarray datasets , 2008, BMC Bioinformatics.

[12]  R. Irizarry,et al.  Consolidated strategy for the analysis of microarray spike-in data , 2008, Nucleic acids research.

[13]  S. Sealfon,et al.  Accuracy and calibration of commercial oligonucleotide and custom cDNA microarrays. , 2002, Nucleic acids research.

[14]  D. Allison,et al.  Microarray data analysis: from disarray to consolidation and consensus , 2006, Nature Reviews Genetics.

[15]  F. Speleman,et al.  Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes , 2002, Genome Biology.

[16]  Rafael A. Irizarry,et al.  A framework for oligonucleotide microarray preprocessing , 2010, Bioinform..

[17]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[18]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[19]  Aiguo Zhang,et al.  A multiplex branched DNA assay for parallel quantitative gene expression profiling. , 2006, Analytical biochemistry.

[20]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[21]  Kenneth G. C. Smith,et al.  Microarray analysis of human leucocyte subsets: the advantages of positive selection and rapid purification , 2007, BMC Genomics.

[22]  Tania Nolan,et al.  Pitfalls of quantitative real-time reverse-transcription polymerase chain reaction. , 2004, Journal of biomolecular techniques : JBT.

[23]  Hugues Bersini,et al.  Batch effect removal methods for microarray gene expression data integration: a survey , 2013, Briefings Bioinform..

[24]  John Okyere,et al.  How to decide? Different methods of calculating gene expression from short oligonucleotide array data will give different results , 2006, BMC Bioinformatics.

[25]  Matthew N. McCall,et al.  The Gene Expression Barcode: leveraging public data repositories to begin cataloging the human and murine transcriptomes , 2010, Nucleic Acids Res..

[26]  Rafael A. Irizarry,et al.  Comparison of Affymetrix GeneChip expression measures , 2006, Bioinform..

[27]  Audrey Kauffmann,et al.  Bioinformatics Applications Note Arrayqualitymetrics—a Bioconductor Package for Quality Assessment of Microarray Data , 2022 .

[28]  Jennifer L. Osborn,et al.  Direct multiplexed measurement of gene expression with color-coded probe pairs , 2008, Nature Biotechnology.

[29]  Michal Dabrowski,et al.  Probe set filtering increases correlation between Affymetrix GeneChip and qRT-PCR expression measurements , 2010, BMC Bioinformatics.

[30]  Terence P. Speed,et al.  A benchmark for Affymetrix GeneChip expression measures , 2004, Bioinform..

[31]  Tieliu Shi,et al.  A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data , 2010, The Pharmacogenomics Journal.

[32]  P. Nelson,et al.  Microarray bioinformatics. , 2011, Methods in molecular biology.

[33]  R. Irizarry,et al.  A gene expression bar code for microarray data , 2007, Nature Methods.

[34]  Juan F Medrano,et al.  Real-time PCR for mRNA quantitation. , 2005, BioTechniques.

[35]  R. Gentleman,et al.  Independent filtering increases detection power for high-throughput experiments , 2010, Proceedings of the National Academy of Sciences.

[36]  Catalin C. Barbacioru,et al.  Evaluation of DNA microarray results with quantitative gene expression platforms , 2006, Nature Biotechnology.

[37]  Eric P. Hoffman,et al.  Probe set algorithms: is there a rational best bet? , 2006, BMC Bioinformatics.