quantro: a data-driven approach to guide the choice of an appropriate normalization method

Normalization is an essential step in the analysis of high-throughput data. Multi-sample global normalization methods, such as quantile normalization, have been successfully used to remove technical variation. However, these methods rely on the assumption that observed global changes across samples are due to unwanted technical variability. Applying global normalization methods has the potential to remove biologically driven variation. Currently, it is up to the subject matter experts to determine if the stated assumptions are appropriate. Here, we propose a data-driven alternative. We demonstrate the utility of our method (quantro) through examples and simulations. A software implementation is available from http://www.bioconductor.org/packages/release/bioc/html/quantro.html.

[1]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[2]  Douglas M. Hawkins,et al.  A variance-stabilizing transformation for gene-expression microarray data , 2002, ISMB.

[3]  John D. Storey,et al.  Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis , 2007, PLoS genetics.

[4]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[5]  D. Louis Collins,et al.  Evaluating intensity normalization on MRIs of human brain with multiple sclerosis , 2011, Medical Image Anal..

[6]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[7]  David A. Orlando,et al.  Revisiting Global Gene Expression Analysis , 2012, Cell.

[8]  S. Knudsen,et al.  A new non-linear normalization method for reducing variability in DNA microarray experiments , 2002, Genome Biology.

[9]  Benjamin M. Bolstad,et al.  affy - analysis of Affymetrix GeneChip data at the probe level , 2004, Bioinform..

[10]  C. Crainiceanu,et al.  Statistical normalization techniques for magnetic resonance imaging , 2014, NeuroImage: Clinical.

[11]  Margaret R Karagas,et al.  Blood-based profiles of DNA methylation predict the underlying distribution of cell types , 2013, Epigenetics.

[12]  M. Gerstein,et al.  Variation in Transcription Factor Binding Among Humans , 2010, Science.

[13]  Matthew E Ritchie,et al.  Using the R Package crlmm for Genotyping and Copy Number Estimation. , 2011, Journal of statistical software.

[14]  I. Simon,et al.  Studying and modelling dynamic biological processes using time-series gene expression data , 2012, Nature Reviews Genetics.

[15]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[16]  E. Levanon,et al.  Human housekeeping genes are compact. , 2003, Trends in genetics : TIG.

[17]  Johann A. Gagnon-Bartsch,et al.  Using control genes to correct for unwanted variation in microarray data. , 2012, Biostatistics.

[18]  D. Allison,et al.  Microarray data analysis: from disarray to consolidation and consensus , 2006, Nature Reviews Genetics.

[19]  Thomas A. Louis,et al.  Quantifying uncertainty in genotype calls , 2010, Bioinform..

[20]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[21]  R. Young,et al.  SetDB1 contributes to repression of genes encoding developmental regulators and maintenance of ES cell state. , 2009, Genes & development.

[22]  Javier Cabrera,et al.  Analysis of Data From Viral DNA Microchips , 2001 .

[23]  S. Ranade,et al.  Stem cell transcriptome profiling via massive-scale mRNA sequencing , 2008, Nature Methods.

[24]  Jayaram K. Udupa,et al.  New variants of a method of MRI scale standardization , 2000, IEEE Transactions on Medical Imaging.

[25]  Rafael A. Irizarry,et al.  Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays , 2014, Bioinform..

[26]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[27]  G. Church,et al.  RNA expression analysis using a 30 base pair resolution Escherichia coli genome array , 2000, Nature Biotechnology.

[28]  Shankar Subramaniam,et al.  Evaluation of normalization methods in mammalian microRNA-Seq data. , 2012, RNA.

[29]  Felix Naef,et al.  Absolute mRNA concentrations from sequence-specific calibration of oligonucleotide arrays. , 2003, Nucleic acids research.

[30]  A. Butte,et al.  Further defining housekeeping, or "maintenance," genes Focus on "A compendium of gene expression in normal human tissues". , 2001, Physiological genomics.

[31]  Charles Y. Lin,et al.  Transcriptional Amplification in Tumor Cells with Elevated c-Myc , 2012, Cell.

[32]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[33]  Alyssa C. Frazee,et al.  ReCount: A multi-experiment resource of analysis-ready RNA-seq gene count datasets , 2011, BMC Bioinformatics.

[34]  R. Doerge,et al.  Statistical Design and Analysis of RNA Sequencing Data , 2010, Genetics.

[35]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[36]  R. Durbin,et al.  Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses , 2012, Nature Protocols.

[37]  Devin C. Koestler,et al.  DNA methylation arrays as surrogate measures of cell mixture distribution , 2012, BMC Bioinformatics.

[38]  Nicolas Servant,et al.  A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis , 2013, Briefings Bioinform..

[39]  A. Scherer Batch Effects and Noise in Microarray Experiments , 2009 .

[40]  Mark Reimers,et al.  Making Informed Choices about Microarray Data Analysis , 2010, PLoS Comput. Biol..

[41]  C. Mallows A Note on Asymptotic Joint Normality , 1972 .

[42]  Anna Decker,et al.  Considerations for normalization of DNA methylation data by Illumina 450K BeadChip assay in population studies , 2013, Epigenetics.

[43]  E. Levanon,et al.  Human housekeeping genes, revisited. , 2013, Trends in genetics : TIG.

[44]  John Quackenbush Microarray data normalization and transformation , 2002, Nature Genetics.