Statistical strategies for microRNAseq batch effect reduction.

RNAseq technology is replacing microarray technology as the tool of choice for gene expression profiling. While providing much richer data than microarray, analysis of RNAseq data has been much more challenging. Among the many difficulties of RNAseq analysis, correctly adjusting for batch effect is a pivotal one for large-scale RNAseq based studies. The batch effect of RNAseq data is most obvious in microRNA (miRNA) sequencing studies. Using real miRNA sequencing (miRNAseq) data, we evaluated several batch removal techniques and discussed their effectiveness. We illustrate that by adjusting for batch effect, more reliable differentially expressed genes can be identified. Our study on batch effect in miRNAseq data can serve as a guideline for future miRNAseq studies that might contain batch effect.

[1]  Y. Shyr,et al.  Evaluation of read count based RNAseq analysis methods , 2013, BMC Genomics.

[2]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[3]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[4]  N. Rajewsky,et al.  The evolution of gene regulation by transcription factors and microRNAs , 2007, Nature Reviews Genetics.

[5]  R. Doerge,et al.  Statistical Design and Analysis of RNA Sequencing Data , 2010, Genetics.

[6]  K. V. Donkena,et al.  Batch effect correction for genome-wide methylation data with Illumina Infinium platform , 2011, BMC Medical Genomics.

[7]  Yan Guo,et al.  An evaluation of allele frequency estimation accuracy using pooled sequencing data , 2013, Int. J. Comput. Biol. Drug Des..

[8]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[9]  Jiang Li,et al.  Large Scale Comparison of Gene Expression Levels by Microarrays and RNAseq Using TCGA Data , 2013, PloS one.

[10]  Sarka Pospisilova,et al.  MicroRNA biogenesis, functionality and cancer relevance. , 2006, Biomedical papers of the Medical Faculty of the University Palacky, Olomouc, Czechoslovakia.

[11]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[12]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[13]  R. Tibshirani,et al.  Normalization, testing, and false discovery rate estimation for RNA-sequencing data. , 2012, Biostatistics.

[14]  D. Bartel MicroRNAs: Target Recognition and Regulatory Functions , 2009, Cell.

[15]  J. Shendure The beginning of the end for microarrays? , 2008, Nature Methods.

[16]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[17]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[18]  Ana Kozomara,et al.  miRBase: integrating microRNA annotation and deep-sequencing data , 2010, Nucleic Acids Res..

[19]  Jiang Li,et al.  Evaluation of Allele Frequency Estimation Using Pooled Sequencing Data Simulation , 2013, TheScientificWorldJournal.

[20]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[21]  Yadong Wang,et al.  miR2Disease: a manually curated database for microRNA deregulation in human disease , 2008, Nucleic Acids Res..

[22]  K. Hansen,et al.  Removing technical variability in RNA-seq data using conditional quantile normalization , 2012, Biostatistics.

[23]  Yan Guo,et al.  MultiRankSeq: Multiperspective Approach for RNAseq Differential Expression Analysis and Quality Control , 2014, BioMed research international.