Evaluation of normalization methods in mammalian microRNA-Seq data.

Simple total tag count normalization is inadequate for microRNA sequencing data generated from the next generation sequencing technology. However, so far systematic evaluation of normalization methods on microRNA sequencing data is lacking. We comprehensively evaluate seven commonly used normalization methods including global normalization, Lowess normalization, Trimmed Mean Method (TMM), quantile normalization, scaling normalization, variance stabilization, and invariant method. We assess these methods on two individual experimental data sets with the empirical statistical metrics of mean square error (MSE) and Kolmogorov-Smirnov (K-S) statistic. Additionally, we evaluate the methods with results from quantitative PCR validation. Our results consistently show that Lowess normalization and quantile normalization perform the best, whereas TMM, a method applied to the RNA-Sequencing normalization, performs the worst. The poor performance of TMM normalization is further evidenced by abnormal results from the test of differential expression (DE) of microRNA-Seq data. Comparing with the models used for DE, the choice of normalization method is the primary factor that affects the results of DE. In summary, Lowess normalization and quantile normalization are recommended for normalizing microRNA-Seq data, whereas the TMM method should be used with caution.

[1]  Sven Rahmann,et al.  Deep sequencing reveals differential expression of microRNAs in favorable versus unfavorable neuroblastoma , 2010, Nucleic acids research.

[2]  Joel S Parker,et al.  microRNA expression in the prefrontal cortex of individuals with schizophrenia and schizoaffective disorder , 2007, Genome Biology.

[3]  Lai Wei,et al.  Regulation of microRNA expression and abundance during lymphopoiesis. , 2010, Immunity.

[4]  Xuhua Xia,et al.  Using Generalized Procrustes Analysis (GPA) for normalization of cDNA microarray data , 2008, BMC Bioinformatics.

[5]  Sylvain Pradervand,et al.  Impact of normalization on miRNA microarray expression profiling. , 2009, RNA.

[6]  H. Grosshans Regulation of microRNAs , 2010 .

[7]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[8]  Pearlly Yan,et al.  Comparative study on ChIP-seq data: normalization and binding pattern characterization , 2009, Bioinform..

[9]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[10]  Li Ding,et al.  Complete characterization of the microRNAome in a patient with acute myeloid leukemia. , 2010, Blood.

[11]  Jeffrey G. Reid,et al.  Expression profiling of microRNAs by deep sequencing , 2009, Briefings Bioinform..

[12]  M. Fabbri,et al.  Regulatory mechanisms of microRNAs involvement in cancer , 2007, Expert opinion on biological therapy.

[13]  Ryan M. O’Connell,et al.  Physiological and pathological roles for microRNAs in the immune system , 2010, Nature Reviews Immunology.

[14]  T. Speed,et al.  Statistical issues in cDNA microarray data analysis. , 2003, Methods in molecular biology.

[15]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[16]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[17]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[18]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[19]  B. Meyers,et al.  Experimental design, preprocessing, normalization and differential expression analysis of small RNA sequencing experiments , 2011, Silence.

[20]  D. Bartel,et al.  The impact of microRNAs on protein output , 2008, Nature.

[21]  S. Srivastava,et al.  A two-parameter generalized Poisson model to improve the analysis of RNA-seq data , 2010, Nucleic acids research.

[22]  Gene W. Yeo,et al.  Deep sequencing identifies new and regulated microRNAs in Schmidtea mediterranea. , 2009, RNA.

[23]  A. Oshlack,et al.  Transcript length bias in RNA-seq data confounds systems biology , 2009, Biology Direct.

[24]  Jae K. Lee,et al.  Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays , 2003, Bioinform..

[25]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[26]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[27]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[28]  Willem A Rensink,et al.  Statistical issues in microarray data analysis. , 2006, Methods in molecular biology.

[29]  Todd Wylie,et al.  Next-generation sequencing identifies the natural killer cell microRNA transcriptome. , 2010, Genome research.