Generalizing Off-Policy Evaluation From a Causal Perspective For Sequential Decision-Making

Assessing the effects of a policy based on observational data from a different policy is a common problem across several high-stake decision-making domains, and several off-policy evaluation (OPE) techniques have been proposed for this purpose. However, these methods largely formulate OPE as a problem disassociated from the process used to generate the data (i.e. structural assumptions in the form of a causal graph). We argue that explicitly highlighting this association has important implications on our understanding of the fundamental limits of OPE. First, this implies that current formulation of OPE corresponds to a narrow set of tasks, i.e. a specific causal estimand which is focused on prospective evaluation of policies over populations or sub-populations. Second, we demonstrate how this association motivates natural desiderata to consider a more general set of causal estimands, particularly extending the role of OPE for counterfactual or retrospective off-policy evaluation at the level of individual units (e.g. patient-level) of the population. Further, a precise description of the causal estimand highlights which OPE estimands are identifiable from observational data under stated generative assumptions. For those OPE estimands that are not identifiable from observational data, the causal perspective further highlights where additional experimental data is necessary for identification, and thus naturally highlights situations where human expertise can aid identification and estimation. Furthermore, many formalisms of OPE overlook the role of uncertainty entirely in the estimation process.We demonstrate how specifically characterising the causal estimand highlights the different sources of uncertainty. The role of human expertise then naturally follows through in terms of managing the induced uncertainty. We discuss each of these aspects as actionable desiderata for future OPE research at scale and in-line with practical utility.

[1]  Doina Precup,et al.  Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.

[2]  J. Pearl,et al.  Effects of Treatment on the Treated: Identification and Generalization , 2009, UAI.

[3]  Yao Liu,et al.  Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions , 2020, ICML.

[4]  Finale Doshi-Velez,et al.  Defining admissible rewards for high-confidence policy evaluation in batch reinforcement learning , 2020, CHIL.

[5]  Nando de Freitas,et al.  Learning Deep Features in Instrumental Variable Regression , 2020, ICLR.

[6]  James J. Heckman,et al.  Randomization and Social Policy Evaluation , 1991 .

[7]  Andrew Slavin Ross,et al.  Improving counterfactual reasoning with kernelised dynamic mixing models , 2018, PloS one.

[8]  S. Levine,et al.  Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.

[9]  Yao Liu,et al.  Combining Parametric and Nonparametric Models for Off-Policy Evaluation , 2019, ICML.

[10]  R. Bellman A Markovian Decision Process , 1957 .

[11]  Joshua D. Angrist,et al.  Identification of Causal Effects Using Instrumental Variables , 1993 .

[12]  Nan Jiang,et al.  Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.

[13]  Uri Shalit,et al.  Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data , 2021, ArXiv.

[14]  Nathan Kallus,et al.  Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning , 2020, NeurIPS.

[15]  J. Robins,et al.  Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .

[16]  D. A. Kenny,et al.  Correlation and Causation. , 1982 .

[17]  Emma Brunskill,et al.  Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding , 2020, NeurIPS.

[18]  Jarvis T. Chen,et al.  The Role of Stage at Diagnosis in Colorectal Cancer Black–White Survival Disparities: A Counterfactual Causal Inference Approach , 2015, Cancer Epidemiology, Biomarkers & Prevention.

[19]  Sergey Levine,et al.  Offline policy evaluation across representations with applications to educational games , 2014, AAMAS.

[20]  Csaba Szepesvári,et al.  CoinDICE: Off-Policy Confidence Interval Estimation , 2020, NeurIPS.

[21]  Philip S. Thomas,et al.  Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.

[22]  J. Heckman,et al.  Policy-Relevant Treatment Effects , 2001 .

[23]  Finale Doshi-Velez,et al.  Combining Kernel and Model Based Learning for HIV Therapy Selection , 2017, CRI.

[24]  Saurabh Johri,et al.  Counterfactual diagnosis , 2019, ArXiv.

[25]  David Silver,et al.  Concurrent Reinforcement Learning from Customer Interactions , 2013, ICML.

[26]  Predrag Klasnja,et al.  Off-Policy Estimation of Long-Term Average Outcomes with Applications to Mobile Health , 2019, ArXiv.

[27]  Judea Pearl,et al.  What Counterfactuals Can Be Tested , 2007, UAI.

[28]  Z. Geng,et al.  Identifying Causal Effects With Proxy Variables of an Unmeasured Confounder. , 2016, Biometrika.

[29]  Uri Shalit,et al.  Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding , 2021, ICML.

[30]  Shie Mannor,et al.  Off-Policy Evaluation in Partially Observable Environments , 2020, AAAI.

[31]  J. Pearl,et al.  A Crash Course in Good and Bad Controls , 2020, SSRN Electronic Journal.

[32]  David Sontag,et al.  Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models , 2019, ICML.

[33]  Finale Doshi-Velez,et al.  Learning Under Adversarial and Interventional Shifts , 2021, ArXiv.

[34]  Shaping Control Variates for Off-Policy Evaluation , 2020 .

[35]  Johan Pallud,et al.  A Tumor Growth Inhibition Model for Low-Grade Glioma Treated with Chemotherapy or Radiotherapy , 2012, Clinical Cancer Research.

[36]  B. Adams,et al.  Dynamic multidrug therapies for hiv: optimal and sti control approaches. , 2004, Mathematical biosciences and engineering : MBE.

[37]  Marzyeh Ghassemi,et al.  Confounding Feature Acquisition for Causal Effect Estimation , 2020, ML4H@NeurIPS.

[38]  Jin Tian,et al.  Estimating Identifiable Causal Effects through Double Machine Learning , 2021, AAAI.

[39]  T. Richardson Single World Intervention Graphs ( SWIGs ) : A Unification of the Counterfactual and Graphical Approaches to Causality , 2013 .

[40]  P. Spirtes,et al.  Review of Causal Discovery Methods Based on Graphical Models , 2019, Front. Genet..