Parallel comparison of Illumina RNA-Seq and Affymetrix microarray platforms on transcriptomic profiles generated from 5-aza-deoxy-cytidine treated HT-29 colon cancer cells and simulated datasets

BackgroundHigh throughput parallel sequencing, RNA-Seq, has recently emerged as an appealing alternative to microarray in identifying differentially expressed genes (DEG) between biological groups. However, there still exists considerable discrepancy on gene expression measurements and DEG results between the two platforms. The objective of this study was to compare parallel paired-end RNA-Seq and microarray data generated on 5-azadeoxy-cytidine (5-Aza) treated HT-29 colon cancer cells with an additional simulation study.MethodsWe first performed general correlation analysis comparing gene expression profiles on both platforms. An Errors-In-Variables (EIV) regression model was subsequently applied to assess proportional and fixed biases between the two technologies. Then several existing algorithms, designed for DEG identification in RNA-Seq and microarray data, were applied to compare the cross-platform overlaps with respect to DEG lists, which were further validated using qRT-PCR assays on selected genes. Functional analyses were subsequently conducted using Ingenuity Pathway Analysis (IPA).ResultsPearson and Spearman correlation coefficients between the RNA-Seq and microarray data each exceeded 0.80, with 66%~68% overlap of genes on both platforms. The EIV regression model indicated the existence of both fixed and proportional biases between the two platforms. The DESeq and baySeq algorithms (RNA-Seq) and the SAM and eBayes algorithms (microarray) achieved the highest cross-platform overlap rate in DEG results from both experimental and simulated datasets. DESeq method exhibited a better control on the false discovery rate than baySeq on the simulated dataset although it performed slightly inferior to baySeq in the sensitivity test. RNA-Seq and qRT-PCR, but not microarray data, confirmed the expected reversal of SPARC gene suppression after treating HT-29 cells with 5-Aza. Thirty-three IPA canonical pathways were identified by both microarray and RNA-Seq data, 152 pathways by RNA-Seq data only, and none by microarray data only.ConclusionsThese results suggest that RNA-Seq has advantages over microarray in identification of DEGs with the most consistent results generated from DESeq and SAM methods. The EIV regression model reveals both fixed and proportional biases between RNA-Seq and microarray. This may explain in part the lower cross-platform overlap in DEG lists compared to those in detectable genes.

[1]  Tony Lancaster A Note on an ‘Errors in Variables’ Model , 1966 .

[2]  Vanessa M Kvam,et al.  A comparison of statistical methods for detecting differentially expressed genes from RNA-seq data. , 2012, American journal of botany.

[3]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[4]  R. Sekhon,et al.  Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes , 2011 .

[5]  F. James Rohlf,et al.  Host Genes Related to Paneth Cells and Xenobiotic Metabolism Are Associated with Shifts in Human Ileum-Associated Microbial Composition , 2012, PloS one.

[6]  A. Conesa,et al.  Differential expression in RNA-seq: a matter of depth. , 2011, Genome research.

[7]  Matthew D. Young,et al.  From RNA-seq reads to differential expression results , 2010, Genome Biology.

[8]  K. Linnet,et al.  Evaluation of regression procedures for methods comparison studies. , 1993, Clinical chemistry.

[9]  H. Albertsen,et al.  Inhibition of DNA methyltransferase stimulates the expression of signal transducer and activator of transcription 1, 2, and 3 genes in colon tumor cells. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Hans Clevers,et al.  Integrated genome-wide analysis of transcription factor occupancy, RNA polymerase II binding and steady-state RNA levels identify differentially regulated functional gene classes , 2011, Nucleic acids research.

[11]  V. D. Barnett,et al.  Fitting Straight Lines—The Linear Functional Relationship with Replicated Observations , 1970 .

[12]  P. Khaitovich,et al.  BMC Genomics BioMed Central Methodology article Estimating accuracy of RNA-Seq and microarrays with proteomics , 2022 .

[13]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[14]  Robert Tibshirani,et al.  Finding consistent patterns: A nonparametric approach for identifying differential expression in RNA-Seq data , 2013, Statistical methods in medical research.

[15]  Brendan J. Frey,et al.  Transcriptional Profiling of Endocrine Cerebro-Osteodysplasia Using Microarray and Next-Generation Sequencing , 2011, PloS one.

[16]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[17]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  D. Owen,et al.  SPARC promoter hypermethylation in colorectal cancers can be reversed by 5-Aza-2′deoxycytidine to increase SPARC expression and improve therapy response , 2008, British Journal of Cancer.

[19]  Crispin J. Miller,et al.  A comparison of massively parallel nucleotide sequencing with oligonucleotide microarrays for global transcription profiling , 2010, BMC Genomics.

[20]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[21]  H. Levene Robust tests for equality of variances , 1961 .

[22]  Marcel H. Schulz,et al.  A Global View of Gene Activity and Alternative Splicing by Deep Sequencing of the Human Transcriptome , 2008, Science.

[23]  M. Janitz,et al.  Transcriptome profiling in neurodegenerative disease , 2010, Journal of Neuroscience Methods.

[24]  Mark D. Robinson,et al.  Moderated statistical tests for assessing differences in tag abundance , 2007, Bioinform..

[25]  Dan Wang,et al.  A comparison of RNA-Seq and high-density exon array for detecting differential gene expression between closely related species , 2010, Nucleic Acids Res..

[26]  Richard A. Moore,et al.  Fusobacterium nucleatum infection is prevalent in human colorectal carcinoma. , 2012, Genome research.

[27]  I. Goodhead,et al.  Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution , 2008, Nature.

[28]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[29]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[30]  Leming Shi,et al.  Comparing next-generation sequencing and microarray technologies in a toxicological study of the effects of aristolochic acid on rat kidneys. , 2011, Chemical research in toxicology.

[31]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[32]  Mickael Guedj,et al.  Should We Abandon the t-Test in the Analysis of Gene Expression Microarray Data: A Comparison of Variance Modeling Strategies , 2010, PloS one.

[33]  Douglas M. Hawkins,et al.  A variance-stabilizing transformation for gene-expression microarray data , 2002, ISMB.

[34]  G. Grant,et al.  Transcriptome analyses to investigate the pathogenesis of RNA splicing factor retinitis pigmentosa. , 2012, Advances in experimental medicine and biology.

[35]  David M. Rocke,et al.  A Model for Measurement Error for Gene Expression Arrays , 2001, J. Comput. Biol..

[36]  Magnus Rattray,et al.  Making sense of microarray data distributions , 2002, Bioinform..

[37]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[38]  Daniel Bottomly,et al.  Evaluating Gene Expression in C57BL/6J and DBA/2J Mouse Striatum Using RNA-Seq and Microarrays , 2011, PloS one.

[39]  Thomas J. Hardcastle,et al.  baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data , 2010, BMC Bioinformatics.