Supervised Quantile Normalisation

Quantile normalisation is a popular normalisation method for data subject to unwanted variations such as images, speech, or genomic data. It applies a monotonic transformation to the feature values of each sample to ensure that after normalisation, they follow the same target distribution for each sample. Choosing a "good" target distribution remains however largely empirical and heuristic, and is usually done independently of the subsequent analysis of normalised data. We propose instead to couple the quantile normalisation step with the subsequent analysis, and to optimise the target distribution jointly with the other parameters in the analysis. We illustrate this principle on the problem of estimating a linear model over normalised data, and show that it leads to a particular low-rank matrix regression problem that can be solved efficiently. We illustrate the potential of our method, which we term SUQUAN, on simulated data, images and genomic data, where it outperforms standard quantile normalisation.

[1]  Anna Decker,et al.  Considerations for normalization of DNA methylation data by Illumina 450K BeadChip assay in population studies , 2013, Epigenetics.

[2]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[3]  Ali M. Mosammam,et al.  Geostatistics: modeling spatial uncertainty, second edition , 2013 .

[4]  H. D. Brunk,et al.  Statistical inference under order restrictions : the theory and application of isotonic regression , 1973 .

[5]  Dennis B. Troup,et al.  NCBI GEO: archive for functional genomics data sets—10 years on , 2010, Nucleic Acids Res..

[6]  Oleg Burdakov,et al.  A smoothed monotonic regression via L2 regularization , 2018, Knowledge and Information Systems.

[7]  Nicolas Servant,et al.  A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis , 2013, Briefings Bioinform..

[8]  Hermann Ney,et al.  Quantile based histogram equalization for noise robust large vocabulary speech recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Jean-Philippe Vert,et al.  The Kendall and Mallows Kernels for Permutations , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Rafael C. González,et al.  Digital image processing, 3rd Edition , 2008 .

[11]  R. Young,et al.  SetDB1 contributes to repression of genes encoding developmental regulators and maintenance of ES cell state. , 2009, Genes & development.

[12]  Rafael A. Irizarry,et al.  quantro: a data-driven approach to guide the choice of an appropriate normalization method , 2015, Genome Biology.

[13]  Yee Hwa Yang,et al.  Normalization for two-color cDNA microarray data , 2003 .

[14]  L G Nyúl,et al.  On standardizing the MR image intensity scale , 1999, Magnetic resonance in medicine.

[15]  Matthew E Ritchie,et al.  Using the R Package crlmm for Genotyping and Copy Number Estimation. , 2011, Journal of statistical software.

[16]  Randall R. Holmes Linear Representations of Finite Groups , 2008 .

[17]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[18]  Javier Cabrera,et al.  Analysis of Data From Viral DNA Microchips , 2001 .

[19]  Mukund Padmanabhan,et al.  A nonlinear unsupervised adaptation technique for speech recognition , 2000, INTERSPEECH.

[20]  Rafael A Irizarry,et al.  Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. , 2006, Biostatistics.

[21]  Adi Shraibman,et al.  Rank, Trace-Norm and Max-Norm , 2005, COLT.

[22]  S. Ranade,et al.  Stem cell transcriptome profiling via massive-scale mRNA sequencing , 2008, Nature Methods.

[23]  M. Gerstein,et al.  Variation in Transcription Factor Binding Among Humans , 2010, Science.

[24]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[25]  D. Louis Collins,et al.  Evaluating intensity normalization on MRIs of human brain with multiple sclerosis , 2011, Medical Image Anal..

[26]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[27]  Jayaram K. Udupa,et al.  New variants of a method of MRI scale standardization , 2000, IEEE Transactions on Medical Imaging.

[28]  P. Diaconis Group representations in probability and statistics , 1988 .

[29]  Hermann Ney,et al.  Histogram based normalization in the acoustic feature space , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..