Improving reliability and absolute quantification of human brain microarray data by filtering and scaling probes using RNA-Seq

BackgroundHigh-throughput sequencing is gradually replacing microarrays as the preferred method for studying mRNA expression levels, providing nucleotide resolution and accurately measuring absolute expression levels of almost any transcript, known or novel. However, existing microarray data from clinical, pharmaceutical, and academic settings represent valuable and often underappreciated resources, and methods for assessing and improving the quality of these data are lacking.ResultsTo quantitatively assess the quality of microarray probes, we directly compare RNA-Seq to Agilent microarrays by processing 231 unique samples from the Allen Human Brain Atlas using RNA-Seq. Both techniques provide highly consistent, highly reproducible gene expression measurements in adult human brain, with RNA-Seq slightly outperforming microarray results overall. We show that RNA-Seq can be used as ground truth to assess the reliability of most microarray probes, remove probes with off-target effects, and scale probe intensities to match the expression levels identified by RNA-Seq. These sequencing scaled microarray intensities (SSMIs) provide more reliable, quantitative estimates of absolute expression levels for many genes when compared with unscaled intensities. Finally, we validate this result in two human cell lines, showing that linear scaling factors can be applied across experiments using the same microarray platform.ConclusionsMicroarrays provide consistent, reproducible gene expression measurements, which are improved using RNA-Seq as ground truth. We expect that our strategy could be used to improve probe quality for many data sets from major existing repositories.

[1]  Allan R. Jones,et al.  The Allen Human Brain Atlas Comprehensive gene expression mapping of the human brain , 2012, Trends in Neurosciences.

[2]  F. Gage,et al.  RNA-sequencing from single nuclei , 2013, Proceedings of the National Academy of Sciences.

[3]  W. Markesbery,et al.  Incipient Alzheimer's disease: Microarray correlation analyses reveal major transcriptional and tumor suppressor responses , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Crispin J. Miller,et al.  A comparison of massively parallel nucleotide sequencing with oligonucleotide microarrays for global transcription profiling , 2010, BMC Genomics.

[5]  Allan R. Jones,et al.  An anatomically comprehensive atlas of the adult human brain transcriptome , 2012, Nature.

[6]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[7]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[8]  R. Myers,et al.  Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data , 2005, Nucleic acids research.

[9]  S. Nelson,et al.  DNA-microarray analysis of brain cancer: molecular classification for therapy , 2004, Nature Reviews Neuroscience.

[10]  Petri Auvinen,et al.  Are data from different gene expression microarray platforms comparable? , 2004, Genomics.

[11]  S. Friend,et al.  A network view of disease and compound screening , 2009, Nature Reviews Drug Discovery.

[12]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[13]  M. Salit,et al.  Synthetic Spike-in Standards for Rna-seq Experiments Material Supplemental Open Access License Commons Creative , 2022 .

[14]  Mark Gerstein,et al.  Bioinformatics Applications Note Gene Expression Rseqtools: a Modular Framework to Analyze Rna-seq Data Using Compact, Anonymized Data Summaries , 2022 .

[15]  D. Geschwind,et al.  Functional and Evolutionary Insights into Human Brain Development through Global Transcriptome Analysis , 2009, Neuron.

[16]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[17]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[18]  Koji Kadota,et al.  A normalization strategy for comparing tag count data , 2012, Algorithms for Molecular Biology.

[19]  C. Altar,et al.  Target Identification for CNS Diseases by Transcriptional Profiling , 2009, Neuropsychopharmacology.

[20]  Sunitha Kogenaru,et al.  RNA-seq and microarray complement each other in transcriptome profiling , 2012, BMC Genomics.

[21]  Daniel R. Salomon,et al.  Strategies for aggregating gene expression data: The collapseRows R function , 2011, BMC Bioinformatics.

[22]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[23]  D. Levy,et al.  A systematic comparison and evaluation of high density exon arrays and RNA-seq technology used to unravel the peripheral blood transcriptome of sickle cell disease , 2012, BMC Medical Genomics.

[24]  Michael Hawrylycz,et al.  Quantitative methods for genome-scale analysis of in situ hybridization and correlation with microarray data , 2008, Genome Biology.

[25]  Ibrahim Emam,et al.  ArrayExpress update—from an archive of functional genomics experiments to the atlas of gene expression , 2008, Nucleic Acids Res..

[26]  John Aach,et al.  Measuring absolute expression with microarrays with a calibrated reference sample and an extended signal intensity range , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Heather J. Ruskin,et al.  RNA-Seq vs Dual- and Single-Channel Microarray Data: Sensitivity Analysis for Differential Expression and Clustering , 2012, PloS one.

[28]  T. Hashimshony,et al.  CEL-Seq: single-cell RNA-Seq by multiplexed linear amplification. , 2012, Cell reports.

[29]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[30]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[31]  A. Orth,et al.  Large-scale analysis of the human and mouse transcriptomes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Catalin C. Barbacioru,et al.  RNA-Seq analysis to capture the transcriptome landscape of a single cell , 2010, Nature Protocols.

[33]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[34]  S. Horvath,et al.  Conservation and evolution of gene coexpression networks in human and chimpanzee brains , 2006, Proceedings of the National Academy of Sciences.

[35]  Jennifer L. Osborn,et al.  Direct multiplexed measurement of gene expression with color-coded probe pairs , 2008, Nature Biotechnology.

[36]  Robert W. Williams,et al.  Genome-Wide Gene Expression Profiling of Nucleus Accumbens Neurons Projecting to Ventral Pallidum Using both Microarray and Transcriptome Sequencing , 2011, Front. Neurosci..

[37]  John Quackenbush,et al.  Multiple-laboratory comparison of microarray platforms , 2005, Nature Methods.

[38]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Mary Goldman,et al.  The UCSC Genome Browser database: extensions and updates 2013 , 2012, Nucleic Acids Res..

[40]  Colin N. Dewey,et al.  RNA-Seq gene expression estimation with read mapping uncertainty , 2009, Bioinform..

[41]  B. Wilhelm,et al.  RNA-Seq-quantitative measurement of expression through massively parallel RNA-sequencing. , 2009, Methods.

[42]  Cole Trapnell,et al.  Computational methods for transcriptome annotation and quantification using RNA-seq , 2011, Nature Methods.