Predictive Performance Comparison of Decision Policies Under Confounding

Predictive models are often introduced to decision-making tasks under the rationale that they improve performance over an existing decision-making policy. However, it is challenging to compare predictive performance against an existing decision-making policy that is generally under-specified and dependent on unobservable factors. These sources of uncertainty are often addressed in practice by making strong assumptions about the data-generating mechanism. In this work, we propose a method to compare the predictive performance of decision policies under a variety of modern identification approaches from the causal inference and off-policy evaluation literatures (e.g., instrumental variable, marginal sensitivity model, proximal variable). Key to our method is the insight that there are regions of uncertainty that we can safely ignore in the policy comparison. We develop a practical approach for finite-sample estimation of regret intervals under no assumptions on the parametric form of the status quo policy. We verify our framework theoretically and via synthetic data experiments. We conclude with a real-world application using our framework to support a pre-deployment evaluation of a proposed modification to a healthcare enrollment policy.

[1]  E. Pierson,et al.  Domain constraints improve risk prediction when outcome data is missing , 2023, ICLR.

[2]  Ruijiang Gao,et al.  Confounding-Robust Policy Improvement with Human-AI Teams , 2023, ArXiv.

[3]  Xiaojie Mao,et al.  Learning under Selective Labels with Data from Heterogeneous Decision-makers: An Instrumental Variable Approach , 2023, ArXiv.

[4]  I. Shpitser,et al.  Partial Identification of Causal Effects Using Proxy Variables , 2023, 2304.04374.

[5]  Niao He,et al.  Kernel Conditional Moment Constraints for Confounding Robust Inference , 2023, AISTATS.

[6]  Zhiwei Steven Wu,et al.  Ground(less) Truth: A Causal Framework for Proxy Labels in Human-Algorithm Decision-Making , 2023, FAccT.

[7]  Fredrik D. Johansson,et al.  Off-Policy Evaluation with Out-of-Sample Guarantees , 2023, 2301.08649.

[8]  Masatoshi Uehara,et al.  A Review of Off-Policy Evaluation in Reinforcement Learning , 2022, ArXiv.

[9]  Kenneth Holstein,et al.  A Validity Perspective on Evaluating the Justified Use of Data-driven Decision-making Algorithms , 2022, 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML).

[10]  Zhiwei Steven Wu,et al.  How Child Welfare Workers Reduce Racial Disparities in Algorithmic Decisions , 2022, CHI.

[11]  Stefan Feuerriegel,et al.  Generalizing Off-Policy Learning under Sample Selection Bias , 2021, UAI.

[12]  R. Baker,et al.  Algorithmic Bias in Education , 2021, International Journal of Artificial Intelligence in Education.

[13]  Nando de Freitas,et al.  Active Offline Policy Selection , 2021, NeurIPS.

[14]  A. Gretton,et al.  Deep Proxy Causal Learning and its Application to Confounded Bandit Policy Evaluation , 2021, NeurIPS.

[15]  Sivaraman Balakrishnan,et al.  Semiparametric Counterfactual Density Estimation , 2021, Biometrika.

[16]  Bo Dai,et al.  Offline Policy Selection under Uncertainty , 2020, AISTATS.

[17]  Nathan Kallus,et al.  Minimax-Optimal Policy Learning Under Unobserved Confounding , 2020, Manag. Sci..

[18]  E. J. Tchetgen Tchetgen,et al.  An Introduction to Proximal Causal Learning , 2020, medRxiv.

[19]  Zhengyuan Zhou,et al.  Distributionally Robust Policy Evaluation and Learning in Offline Contextual Bandits , 2020, ICML.

[20]  Csaba Szepesvari,et al.  Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting , 2020, AISTATS.

[21]  Shie Mannor,et al.  Bandits with partially observable confounded data , 2020, UAI.

[22]  Edward H. Kennedy Towards optimal doubly robust estimation of heterogeneous causal effects , 2020, Electronic Journal of Statistics.

[23]  Alexandra Chouldechova,et al.  Fairness Evaluation in Presence of Biased Noisy Labels , 2020, AISTATS.

[24]  Emma Brunskill,et al.  Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding , 2020, NeurIPS.

[25]  Bo Zhang,et al.  Selecting and Ranking Individualized Treatment Rules With Unmeasured Confounding , 2020, Journal of the American Statistical Association.

[26]  Nathan Kallus,et al.  Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning , 2020, NeurIPS.

[27]  Avi Feller,et al.  Bayesian Sensitivity Analysis for Offline Policy Evaluation , 2020, AIES.

[28]  Yunfeng Zhang,et al.  Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making , 2020, FAT*.

[29]  Brian W. Powers,et al.  Dissecting racial bias in an algorithm used to manage the health of populations , 2019, Science.

[30]  Martin Huber,et al.  An introduction to flexible methods for policy evaluation , 2019, 1910.00641.

[31]  Z. Obermeyer,et al.  Diagnosing Physician Error: A Machine Learning Approach to Low-Value Health Care , 2019 .

[32]  Xiaojie Mao,et al.  Assessing algorithmic fairness with unobserved protected class using data combination , 2019, FAT*.

[33]  Gang Niu,et al.  Are Anchor Points Really Indispensable in Label-Noise Learning? , 2019, NeurIPS.

[34]  Giulia Battistoni,et al.  Causality , 2019, Mind and the Present.

[35]  Kush R. Varshney,et al.  Fair Transfer Learning with Missing Protected Attributes , 2019, AIES.

[36]  W. Miao,et al.  A Confounding Bridge Approach for Double Negative Control Inference on Causal Effects (Supplement and Sample Codes are included) , 2018, 1808.04945.

[37]  Nathan Kallus,et al.  Confounding-Robust Policy Improvement , 2018, NeurIPS.

[38]  Alexandra Chouldechova,et al.  A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions , 2018, FAT.

[39]  Sivaraman Balakrishnan,et al.  Sharp instruments for classifying compliers and generalizing causal effects , 2018, The Annals of Statistics.

[40]  J. Leskovec,et al.  The Selective Labels Problem: Evaluating Algorithmic Predictions in the Presence of Unobservables , 2017, KDD.

[41]  J. Leskovec,et al.  Human Decisions and Machine Predictions , 2017, The quarterly journal of economics.

[42]  Jason Abaluck,et al.  The Determinants of Productivity in Medical Testing: Intensity and Allocation of Care. , 2016, The American economic review.

[43]  Nan Jiang,et al.  Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.

[44]  Toru Kitagawa,et al.  Who should be Treated? Empirical Welfare Maximization Methods for Treatment Choice , 2015 .

[45]  John Langford,et al.  Doubly Robust Policy Evaluation and Optimization , 2014, ArXiv.

[46]  Gilles Blanchard,et al.  Classification with Asymmetric Label Noise: Consistency and Maximal Denoising , 2013, COLT.

[47]  M. Hudgens,et al.  Toward Causal Inference With Interference , 2008, Journal of the American Statistical Association.

[48]  Zhiqiang Tan,et al.  A Distributional Approach for Causal Inference Using Propensity Scores , 2006 .

[49]  Paul R. Rosenbaum,et al.  Sensitivity Analysis in Observational Studies , 2005 .

[50]  P. Rosenbaum Covariance Adjustment in Randomized Experiments and Observational Studies , 2002 .

[51]  Doina Precup,et al.  Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.

[52]  C. Manski,et al.  Monotone Instrumental Variables with an Application to the Returns to Schooling , 1998 .

[53]  D. Angluin,et al.  Learning From Noisy Examples , 1988, Machine Learning.

[54]  P. Rosenbaum Sensitivity analysis for certain permutation inferences in matched observational studies , 1987 .

[55]  H. Rittel,et al.  Dilemmas in a general theory of planning , 1973 .

[56]  Amanda Coston,et al.  Counterfactual Risk Assessments under Unmeasured Confounding , 2022, ArXiv.

[57]  Suproteem K. Sarkar,et al.  An Economic Approach to Machine Learning in Health Policy , 2022, SSRN Electronic Journal.

[58]  Ashesh Rambachan Identifying Prediction Mistakes in Observational Data* , 2021 .

[59]  Ziad Obermeyer,et al.  NBER WORKING PAPER SERIES A MACHINE LEARNING APPROACH TO LOW-VALUE HEALTH CARE: WASTED TESTS, MISSED HEART ATTACKS AND MIS-PREDICTIONS , 2019 .

[60]  W. Grove,et al.  Clinical versus mechanical prediction: a meta-analysis. , 2000, Psychological assessment.

[61]  C. Manski Anatomy of the Selection Problem , 1989 .

[62]  C. Manski Nonparametric Bounds on Treatment Effects , 1989 .