Bayesian sparse mediation analysis with targeted penalization of natural indirect effects

Causal mediation analysis aims to characterize an exposure's effect on an outcome and quantify the indirect effect that acts through a given mediator or a group of mediators of interest. With the increasing availability of measurements on a large number of potential mediators, like the epigenome or the microbiome, new statistical methods are needed to simultaneously accommodate high-dimensional mediators while directly target penalization of the natural indirect effect (NIE) for active mediator identification. Here, we develop two novel prior models for identification of active mediators in high-dimensional mediation analysis through penalizing NIEs in a Bayesian paradigm. Both methods specify a joint prior distribution on the exposure-mediator effect and mediator-outcome effect with either (a) a four-component Gaussian mixture prior or (b) a product threshold Gaussian prior. By jointly modeling the two parameters that contribute to the NIE, the proposed methods enable penalization on their product in a targeted way. Resultant inference can take into account the four-component composite structure underlying the NIE. We show through simulations that the proposed methods improve both selection and estimation accuracy compared to other competing methods. We applied our methods for an in-depth analysis of two ongoing epidemiologic studies: the Multi-Ethnic Study of Atherosclerosis (MESA) and the LIFECODES birth cohort. The identified active mediators in both studies reveal important biological pathways for understanding disease mechanisms.

[1]  K. Kvaløy,et al.  Epigenome-wide methylation differences in a group of lean and obese women – A HUNT Study , 2018, Scientific Reports.

[2]  M. van Iterson,et al.  Controlling bias and inflation in epigenome- and transcriptome-wide association studies using the empirical null distribution , 2016, Genome Biology.

[3]  R. Weksberg,et al.  Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray , 2013, Epigenetics.

[4]  J. Meeker,et al.  Application of an analytical framework for multivariate mediation analysis of environmental data , 2020, Nature Communications.

[5]  Bhramar Mukherjee,et al.  Bayesian shrinkage estimation of high dimensional causal mediation effects in omics studies. , 2019, Biometrics.

[6]  Donatello Telesca,et al.  Nonlocal Priors for High-Dimensional Estimation , 2014, Journal of the American Statistical Association.

[7]  Xi Luo,et al.  Pathway Lasso: Estimate and Select Sparse Mediation Pathways with High Dimensional Mediators , 2016, 1603.07749.

[8]  Bhramar Mukherjee,et al.  Neighborhood characteristics influence DNA methylation of genes involved in stress response and inflammation: The Multi-Ethnic Study of Atherosclerosis , 2017, Epigenetics.

[9]  A. Calafat,et al.  Quantification of 22 phthalate metabolites in human urine. , 2007, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[10]  V. Hjellvik,et al.  Body mass index, triglycerides, glucose, and blood pressure as predictors of type 2 diabetes in a middle-aged Norwegian cohort of men and women , 2012, Clinical epidemiology.

[11]  Corwin M Zigler,et al.  BAYESIAN METHODS FOR MULTIPLE MEDIATORS: RELATING PRINCIPAL STRATIFICATION AND CAUSAL MEDIATION IN THE ANALYSIS OF POWER PLANT EMISSION CONTROLS. , 2019, The annals of applied statistics.

[12]  James G. Scott,et al.  The horseshoe estimator for sparse signals , 2010 .

[13]  D. A. Kenny,et al.  The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. , 1986, Journal of personality and social psychology.

[14]  Yang Ni,et al.  Bayesian Graphical Regression , 2018, Journal of the American Statistical Association.

[15]  Yen-Tsung Huang,et al.  Hypothesis test of mediation effect in causal mediation model with high‐dimensional continuous mediators , 2016, Biometrics.

[16]  Jian Kang,et al.  Bayesian Network Marker Selection via the Thresholded Graph Laplacian Gaussian Prior. , 2018, Bayesian analysis.

[17]  D. Rubin Comment: Which Ifs Have Causal Answers , 1986 .

[18]  T. VanderWeele Mediation Analysis: A Practitioner's Guide. , 2016, Annual review of public health.

[19]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[20]  J. Meeker,et al.  Variability in urinary phthalate metabolite levels across pregnancy and sensitive windows of exposure for the risk of preterm birth. , 2014, Environment international.

[21]  B. Reich,et al.  Scalar‐on‐image regression via the soft‐thresholded Gaussian process , 2016, Biometrika.

[22]  Yen-Tsung Huang,et al.  Genome-wide analyses of sparse mediation effects under composite null hypotheses , 2019, The Annals of Applied Statistics.

[23]  J. Erdmann,et al.  A decade of genome-wide association studies for coronary artery disease: the challenges ahead , 2018, Cardiovascular research.

[24]  Matthew S. Fritz,et al.  Mediation analysis. , 2019, Annual review of psychology.

[25]  Bhramar Mukherjee,et al.  Repeated measures of urinary oxidative stress biomarkers during pregnancy and preterm birth. , 2015, American journal of obstetrics and gynecology.

[26]  Elizabeth L. Ogburn,et al.  High-dimensional multivariate mediation with application to neuroimaging data. , 2015, Biostatistics.

[27]  Wei Zhang,et al.  Estimating and testing high-dimensional mediation effects in epigenetic studies , 2016, Bioinform..

[28]  Youfei Yu,et al.  Prediction and associations of preterm birth and its subtypes with eicosanoid enzymatic pathways and inflammatory markers , 2019, Scientific Reports.

[29]  M. Andersen,et al.  CNC-bZIP protein Nrf1-dependent regulation of glucose-stimulated insulin secretion. , 2015, Antioxidants & redox signaling.

[30]  Samuel Parry,et al.  Longitudinal evaluation of predictive value for preeclampsia of circulating angiogenic factors through pregnancy. , 2012, American journal of obstetrics and gynecology.

[31]  Pan Du,et al.  lumi: a pipeline for processing Illumina microarray , 2008, Bioinform..

[32]  T J VanderWeele,et al.  Mediation Analysis with Multiple Mediators , 2014, Epidemiologic methods.

[33]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[34]  D. Jacobs,et al.  Methylomics of gene expression in human monocytes. , 2013, Human molecular genetics.

[35]  D. Rubin Randomization Analysis of Experimental Data: The Fisher Randomization Test Comment , 1980 .

[36]  J. Meeker,et al.  Application of a novel analytical pipeline for high-dimensional multivariate mediation analysis of environmental data , 2020, medRxiv.

[37]  D. Mackinnon Introduction to Statistical Mediation Analysis , 2008 .

[38]  L. Keele,et al.  Identification, Inference and Sensitivity Analysis for Causal Mediation Effects , 2010, 1011.1079.

[39]  Susanna Cirera,et al.  Identification of co-expression gene networks, regulatory genes and pathways for obesity based on adipose tissue RNA Sequencing in a porcine model , 2014, BMC Medical Genomics.

[40]  R. Kronmal,et al.  Multi-Ethnic Study of Atherosclerosis: objectives and design. , 2002, American journal of epidemiology.

[41]  R. Gallop,et al.  Mediation analysis with principal stratification , 2009, Statistics in medicine.

[42]  Xiang Zhou,et al.  Pleiotropic Mapping and Annotation Selection in Genome-wide Association Studies with Penalized Gaussian Mixture Models , 2018, bioRxiv.