Managing batch effects in microbiome data

Microbial communities have been increasingly studied in recent years to investigate their role in ecological habitats. However, microbiome studies are difficult to reproduce or replicate as they may suffer from confounding factors that are unavoidable in practice and originate from biological, technical or computational sources. In this review, we define batch effects as unwanted variation introduced by confounding factors that are not related to any factors of interest. Computational and analytical methods are required to remove or account for batch effects. However, inherent microbiome data characteristics (e.g. sparse, compositional and multivariate) challenge the development and application of batch effect adjustment methods to either account or correct for batch effects. We present commonly encountered sources of batch effects that we illustrate in several case studies. We discuss the limitations of current methods, which often have assumptions that are not met due to the peculiarities of microbiome data. We provide practical guidelines for assessing the efficiency of the methods based on visual and numerical outputs and a thorough tutorial to reproduce the analyses conducted in this review.

[1]  Jean M. Macklaim,et al.  Microbiome Datasets Are Compositional: And This Is Not Optional , 2017, Front. Microbiol..

[2]  Luis Pedro Coelho,et al.  Plankton networks driving carbon export in the oligotrophic ocean , 2015, Nature.

[3]  Timothy J. Laurent,et al.  A Taxonomic Signature of Obesity in the Microbiome? Getting to the Guts of the Matter , 2014, PloS one.

[4]  Jennifer M. Fettweis,et al.  The truth about metagenomics: quantifying and counteracting bias in 16S rRNA studies , 2015, BMC Microbiology.

[5]  P. Beggs Impacts of climate and climate change on medications and human health , 2000, Australian and New Zealand journal of public health.

[6]  G. Weinstock,et al.  Meta-analysis of the lung microbiota in pulmonary tuberculosis. , 2018, Tuberculosis.

[7]  Hongzhe Li,et al.  Variable selection in regression with compositional covariates , 2014 .

[8]  Jonathan Friedman,et al.  Inferring Correlation Networks from Genomic Survey Data , 2012, PLoS Comput. Biol..

[9]  Jeroen Raes,et al.  How informative is the mouse for human gut microbiota research? , 2015, Disease Models & Mechanisms.

[10]  Jun Yu,et al.  Batch effects correction for microbiome data with Dirichlet‐multinomial regression , 2018, Bioinform..

[11]  M. Toborek,et al.  Circadian Disruption Changes Gut Microbiome Taxa and Functional Gene Composition , 2018, Front. Microbiol..

[12]  S. Givan,et al.  The influence of caging, bedding, and diet on the composition of the microbiota in different regions of the mouse gut , 2018, Scientific Reports.

[13]  Luis Pedro Coelho,et al.  Towards standards for human fecal sample processing in metagenomic studies , 2017, Nature Biotechnology.

[14]  M. Roberfroid,et al.  Dietary modulation of the human colonic microbiota: updating the concept of prebiotics , 2004, Nutrition Research Reviews.

[15]  F. Bushman,et al.  Comparison of placenta samples with contamination controls does not provide evidence for a distinct placenta microbiota , 2016, Microbiome.

[16]  A. Hannan,et al.  Microbiome profiling reveals gut dysbiosis in a transgenic mouse model of Huntington's disease , 2020, Neurobiology of Disease.

[17]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[18]  A. M. Eren,et al.  Minimizing confounders and increasing data quality in murine models for studies of the gut microbiome , 2018, PeerJ.

[19]  G. Mateu-Figueras,et al.  Isometric Logratio Transformations for Compositional Data Analysis , 2003 .

[20]  David Causeur,et al.  Combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment , 2016, BMC Bioinformatics.

[21]  Claire Duvallet,et al.  Correcting for batch effects in case-control microbiome studies , 2018, bioRxiv.

[22]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[23]  A. Benson,et al.  Experimental evaluation of the importance of colonization history in early-life gut microbiota assembly , 2018, eLife.

[24]  Nicola Zamboni,et al.  Gut Microbiota Orchestrates Energy Homeostasis during Cold , 2015, Cell.

[25]  Paul Turner,et al.  Reagent and laboratory contamination can critically impact sequence-based microbiome analyses , 2014, BMC Biology.

[26]  Mario Medvedovic,et al.  Stratified randomization controls better for batch effects in 450K methylation analysis: a cautionary tale , 2014, Front. Genet..

[27]  Hugues Bersini,et al.  Batch effect removal methods for microarray gene expression data integration: a survey , 2013, Briefings Bioinform..

[28]  L. Mazéas,et al.  Increasing concentrations of phenol progressively affect anaerobic digestion of cellulose and associated microbial communities , 2015, Biodegradation.

[29]  E. Chesler,et al.  Host genetic and environmental effects on mouse intestinal microbiota , 2012, The ISME Journal.

[30]  Terence P. Speed,et al.  Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed , 2012, Biostatistics.

[31]  Yinglin Xia,et al.  Hypothesis testing and statistical analysis of microbiome , 2017, Genes & diseases.

[32]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[33]  J. Maldonado,et al.  Colonization and Impact of Disease and Other Factors on Intestinal Microbiota , 2007, Digestive Diseases and Sciences.

[34]  P. Legendre,et al.  Partialling out the spatial component of ecological variation , 1992 .

[35]  Eoin L. Brodie,et al.  Toward a Predictive Understanding of Earth’s Microbiomes to Address 21st Century Challenges , 2016, mBio.

[36]  Cheng Li,et al.  DNA-Chip Analyzer (dChip) , 2003 .

[37]  Jean M. Macklaim,et al.  Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis , 2014, Microbiome.

[38]  Limsoon Wong,et al.  Why Batch Effects Matter in Omics Data, and How to Avoid Them. , 2017, Trends in biotechnology.

[39]  Terence P Speed,et al.  RLE plots: Visualizing unwanted variation in high dimensional data , 2017, PloS one.

[40]  R. Knight,et al.  Meta-analyses of studies of the human microbiota , 2013, Genome research.

[41]  John D. Storey,et al.  Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis , 2007, PLoS genetics.

[42]  Susan P. Holmes,et al.  Waste Not , Want Not : Why Rarefying Microbiome Data is Inadmissible . October 1 , 2013 , 2013 .

[43]  J. Clemente,et al.  Intestinal Microbiota Is Influenced by Gender and Body Mass Index , 2016, PloS one.

[44]  Dean Y. Li,et al.  Endothelial TLR4 and the microbiome drive cerebral cavernous malformations , 2017, Nature.

[45]  M. Pop,et al.  Robust methods for differential abundance analysis in marker gene surveys , 2013, Nature Methods.

[46]  Johann A. Gagnon-Bartsch,et al.  Using control genes to correct for unwanted variation in microarray data. , 2012, Biostatistics.

[47]  Patrick D Schloss,et al.  Identifying and Overcoming Threats to Reproducibility, Replicability, Robustness, and Generalizability in Microbiome Research , 2018, mBio.

[48]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[49]  A. Ericsson,et al.  Microbiota and reproducibility of rodent models , 2017, Lab Animal.

[50]  Crispin J. Miller,et al.  The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets – improving meta-analysis and prediction of prognosis , 2008, BMC Medical Genomics.

[51]  Graham M. Hughes,et al.  Is there a link between aging and microbiome diversity in exceptional mammalian longevity? , 2018, PeerJ.

[52]  Hongzhe Li Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis , 2015 .

[53]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[54]  Falk Hildebrand,et al.  Inflammation-associated enterotypes, host genotype, cage and inter-individual effects drive gut microbiota variation in common laboratory mice , 2013, Genome Biology.

[55]  P. Buttigieg,et al.  A guide to statistical analysis in microbial ecology: a community-focused, living review of multivariate data analyses. , 2014, FEMS microbiology ecology.

[56]  Shila Ghazanfar,et al.  scMerge leverages factor analysis, stable expression, and pseudoreplication to merge multiple single-cell RNA-seq datasets , 2019, Proceedings of the National Academy of Sciences.

[57]  Collins Wenhan Chu,et al.  Human pharyngeal microbiota in age-related macular degeneration , 2018, PloS one.

[58]  J. Koenig,et al.  Microbial shifts in the aging mouse gut , 2014, Microbiome.