Mediation analysis for survival data with high-dimensional mediators

MOTIVATION Mediation analysis has become a prevalent method to identify causal pathway(s) between an independent variable and a dependent variable through intermediate variable(s). However, little work has been done when the intermediate variables (mediators) are high-dimensional and the outcome is a survival endpoint. In this paper, we introduce a novel method to identify potential mediators in a causal framework of high-dimensional Cox regression. RESULTS We first reduce the data dimension through a mediation-based sure independence screening (SIS) method. A de-biased Lasso inference procedure is used for Cox's regression parameters. We adopt a multiple-testing procedure to accurately control the false discovery rate (FDR) when testing high-dimensional mediation hypotheses. Simulation studies are conducted to demonstrate the performance of our method. We apply this approach to explore the mediation mechanisms of 379,330 DNA methylation markers between smoking and overall survival among lung cancer patients in the TCGA lung cancer cohort. Two methylation sites (cg08108679 and cg26478297) are identified as potential mediating epigenetic markers. AVAILABILITY Our proposed method is available with the R package HIMA at https://cran.r-project.org/web/packages/HIMA/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.