Batch Effect Removal via Batch-Free Encoding

Biological measurements often contain systematic errors, also known as “batch effects”, which may invalidate downstream analysis when not handled correctly. The problem of removing batch effects is of major importance in the biological community. Despite recent advances in this direction via deep learning techniques, most current methods may not fully preserve the true biological patterns the data contains. In this work we propose a deep learning approach for batch effect removal. The crux of our approach is learning a batch-free encoding of the data, representing its intrinsic biological properties, but not batch effects. In addition, we also encode the systematic factors through a decoding mechanism and require accurate reconstruction of the data. Altogether, this allows us to fully preserve the true biological patterns represented in the data. Experimental results are reported on data obtained from two high throughput technologies, mass cytometry and single-cell RNA-seq. Beyond good performance on training data, we also observe that our system performs well on test data obtained from new patients, which was not available at training time. Our method is easy to handle, a publicly available code can be found at https://github.com/ushaham/BatchEffectRemoval2018.

[1]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[2]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[3]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[4]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[5]  Jun Zhao,et al.  Removal of batch effects using distribution‐matching residual networks , 2016, Bioinform..

[6]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[7]  Jenna L. Pappalardo,et al.  Neuron Interference: Evidence-Based Batch Effect Removal , 2018 .

[8]  Kevin R. Moon,et al.  Exploring single-cell data with deep multitasking neural networks , 2017, Nature Methods.

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[11]  Lior Wolf,et al.  A Universal Music Translation Network , 2018, ICLR.

[12]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[13]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[14]  G. Nolan,et al.  Mass Cytometry: Single Cells, Many Features , 2016, Cell.

[15]  Stephan Hoyer,et al.  Correcting nuisance variation using Wasserstein distance , 2017, PeerJ.

[16]  Zhiyong Lu,et al.  Generalizing biomedical relation classification with neural adversarial domain adaptation , 2018, Bioinform..

[17]  Yi Yao,et al.  Gating mass cytometry data by deep learning , 2016, bioRxiv.

[18]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.