Causal Mediation Analysis: Selection with Asymptotically Valid Inference

Researchers are often interested in learning not only the effect of treatments on outcomes, but also the pathways through which these effects operate. A mediator is a variable that is affected by treatment and subsequently affects outcome. Existing methods for penalized mediation analyses may lead to ignoring important mediators and either assume that finite-dimensional linear models are sufficient to remove confounding bias, or perform no confounding control at all. In practice, these assumptions may not hold. We propose a method that considers the confounding functions as nuisance parameters to be estimated using data-adaptive methods. We then use a novel regularization method applied to this objective function to identify a set of important mediators. We derive the asymptotic properties of our estimator and establish the oracle property under certain assumptions. Asymptotic results are also presented in a local setting which contrast the proposal with the standard adaptive lasso. We also propose a perturbation bootstrap technique to provide asymptotically valid post-selection inference for the mediated effects of interest. The performance of these methods will be discussed and demonstrated through simulation studies.

[1]  S. Dudoit,et al.  Asymptotics of cross-validated risk estimation in estimator selection and performance assessment , 2005 .

[2]  Wei Zhang,et al.  Estimating and testing high-dimensional mediation effects in epigenetic studies , 2016, Bioinform..

[3]  Elizabeth R. Word,et al.  Student/Teacher Achievement Ratio (STAR) Tennessee's K-3 Class Size Study. Final Summary Report 1985-1990. , 1990 .

[4]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[5]  S. Wood Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models , 2011 .

[6]  E. Mammen The Bootstrap and Edgeworth Expansion , 1997 .

[7]  Elizabeth L. Ogburn,et al.  High-dimensional multivariate mediation with application to neuroimaging data. , 2015, Biostatistics.

[8]  Min Zhang,et al.  Bayesian Shrinkage Estimation of High Dimensional Causal Mediation Effects in Omics Studies , 2018, bioRxiv.

[9]  H. Leeb,et al.  CAN ONE ESTIMATE THE UNCONDITIONAL DISTRIBUTION OF POST-MODEL-SELECTION ESTIMATORS? , 2003, Econometric Theory.

[10]  H. Leeb,et al.  Sparse Estimators and the Oracle Property, or the Return of Hodges' Estimator , 2007, 0704.1466.

[11]  Robert L. Strawderman,et al.  Robust Q-Learning , 2020, Journal of the American Statistical Association.

[12]  A. V. D. Vaart,et al.  Oracle inequalities for multi-fold cross validation , 2006 .

[13]  S. Lahiri,et al.  Perturbation bootstrap in adaptive Lasso , 2017, The Annals of Statistics.

[14]  A. Buja,et al.  Valid post-selection inference , 2013, 1306.1059.

[15]  A. Krueger,et al.  Experimental Estimates of Education Production Functions , 1997 .

[16]  P. Robinson ROOT-N-CONSISTENT SEMIPARAMETRIC REGRESSION , 1988 .

[17]  Arun K. Kuchibhotla,et al.  Valid post-selection inference in model-free linear regression , 2020 .

[18]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[19]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[20]  M. A. Arcones,et al.  ASYMPTOTIC THEORY FOR M-ESTIMATORS OVER A CONVEX KERNEL , 1998, Econometric Theory.

[21]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[22]  M. J. van der Laan,et al.  Statistical Applications in Genetics and Molecular Biology Super Learner , 2010 .

[23]  Xiaohong Chen Chapter 76 Large Sample Sieve Estimation of Semi-Nonparametric Models , 2007 .

[24]  Stijn Vansteelandt,et al.  Odds ratios for mediation analysis for a dichotomous outcome. , 2010, American journal of epidemiology.

[25]  T J VanderWeele,et al.  Mediation Analysis with Multiple Mediators , 2014, Epidemiologic methods.

[26]  Gérard Biau,et al.  Analysis of a Random Forests Model , 2010, J. Mach. Learn. Res..

[27]  Daniel J Schaid,et al.  Penalized models for analysis of multiple mediators , 2020, Genetic epidemiology.

[28]  J. Robins,et al.  Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .

[29]  Hao Helen Zhang,et al.  ON THE ADAPTIVE ELASTIC-NET WITH A DIVERGING NUMBER OF PARAMETERS. , 2009, Annals of statistics.

[30]  Xi Luo,et al.  Pathway Lasso: Estimate and Select Sparse Mediation Pathways with High Dimensional Mediators , 2016, 1603.07749.

[31]  Mark J. van der Laan,et al.  Cross-Validated Targeted Minimum-Loss-Based Estimation , 2011 .

[32]  Lu Tian,et al.  A Perturbation Method for Inference on Regularized Regression Estimates , 2011, Journal of the American Statistical Association.

[33]  Jian Huang,et al.  Estimation and Selection via Absolute Penalized Convex Minimization And Its Multistage Adaptive Applications , 2011, J. Mach. Learn. Res..

[34]  Mark J van der Laan,et al.  The International Journal of Biostatistics Direct Effect Models , 2011 .

[35]  Yen-Tsung Huang,et al.  Hypothesis test of mediation effect in causal mediation model with high‐dimensional continuous mediators , 2016, Biometrics.

[36]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[37]  Dylan S. Small,et al.  Discovering treatment effect heterogeneity through post‐treatment variables with application to the effect of class size on mathematics scores , 2018 .

[38]  Judea Pearl,et al.  Direct and Indirect Effects , 2001, UAI.

[39]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[40]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[41]  J. Robins,et al.  Identifiability and Exchangeability for Direct and Indirect Effects , 1992, Epidemiology.

[42]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[43]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[44]  Stijn Vansteelandt,et al.  Interventional Effects for Mediation Analysis with Multiple Mediators. , 2017, Epidemiology.

[45]  Chris A. J. Klaassen,et al.  Consistent Estimation of the Influence Function of Locally Asymptotically Linear Estimators , 1987 .

[46]  Halbert White,et al.  Improved Rates and Asymptotic Normality for Nonparametric Neural Network Estimators , 1999, IEEE Trans. Inf. Theory.

[47]  Tyler J. VanderWeele,et al.  Conceptual issues concerning mediation, interventions and composition , 2009 .

[48]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .