Operationalizing Counterfactual Metrics: Incentives, Ranking, and Information Asymmetry

From the social sciences to machine learning, it has been well documented that metrics to be optimized are not always aligned with social welfare. In healthcare, Dranove et al. [12] showed that publishing surgery mortality metrics actually harmed the welfare of sicker patients by increasing provider selection behavior. Using a principal-agent model, we directly study the incentive misalignments that arise from such average treated outcome metrics, and show that the incentives driving treatment decisions would align with maximizing total patient welfare if the metrics (i) accounted for counterfactual untreated outcomes and (ii) considered total welfare instead of average welfare among treated patients. Operationalizing this, we show how counterfactual metrics can be modified to satisfy desirable properties when used for ranking. Extending to realistic settings when the providers observe more about patients than the regulatory agencies do, we bound the decay in performance by the degree of information asymmetry between the principal and the agent. In doing so, our model connects principal-agent information asymmetry with unobserved heterogeneity in causal inference.

[1]  Lydia T. Liu,et al.  Reimagining the machine learning life cycle to improve educational outcomes of students , 2023, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Paul Dütting,et al.  Bayesian Analysis of Linear Contracts , 2022, EC.

[3]  Correction to: Why Are Counterfactual Assessment Methods Not Widespread in Outcome-Based Contracts? A Formal Model Approach , 2022, Journal of Public Administration Research and Theory.

[4]  Stephen Bates,et al.  Principal-Agent Hypothesis Testing , 2022, ArXiv.

[5]  Edward H. Kennedy Semiparametric doubly robust targeted double machine learning: a review , 2022, 2203.06469.

[6]  A. Blum,et al.  On classification of strategic agents who can both game and improve , 2022, FORC.

[7]  L. Glance,et al.  Association Between the Physician Quality Score in the Merit-Based Incentive Payment System and Hospital Performance in Hospital Compare in the First Year of the Program , 2021, JAMA network open.

[8]  Stacy E. Lom The Metric Society: On the Quantification of the Social , 2020 .

[9]  D. Kazi,et al.  Quality Measure Development and Associated Spending by the Centers for Medicare & Medicaid Services. , 2020, JAMA.

[10]  Benjamin Edelman,et al.  Learning From Strategic Agents: Accuracy, Improvement, and Causality , 2020, ICML 2020.

[11]  Zhiwei Steven Wu,et al.  Causal Feature Discovery through Strategic Modification , 2020, ArXiv.

[12]  Celestine Mendler-Dünner,et al.  Performative Prediction , 2020, ICML.

[13]  Tim Roughgarden,et al.  The Complexity of Contracts , 2020, SODA.

[14]  Hanna M. Wallach,et al.  Measurement and Fairness , 2019, FAccT.

[15]  Brian W. Powers,et al.  Dissecting racial bias in an algorithm used to manage the health of populations , 2019, Science.

[16]  Moritz Hardt,et al.  Strategic Classification is Causal Modeling in Disguise , 2019, ICML.

[17]  Ryan J. Gagnon,et al.  Measurement Theory and Applications for the Social Sciences , 2019, Measurement: Interdisciplinary Research and Perspectives.

[18]  S. Panguluri,et al.  Cardiovascular Risks Associated with Gender and Aging , 2019, Journal of cardiovascular development and disease.

[19]  Simone Raudino,et al.  The Tyranny of Metrics , 2019, The European Legacy.

[20]  Tim Roughgarden,et al.  Simple versus Optimal Contracts , 2018, EC.

[21]  Jon M. Kleinberg,et al.  How Do Classifiers Induce Agents to Invest Effort Strategically? , 2018, EC.

[22]  Nathan Kallus,et al.  Confounding-Robust Policy Improvement , 2018, NeurIPS.

[23]  D. Koretz The Testing Charade: Pretending to Make Schools Better , 2017 .

[24]  Jessica M. Ameling,et al.  Dissecting Leapfrog: How Well Do Leapfrog Safe Practices Scores Correlate With Hospital Compare Ratings and Penalties, and How Much Do They Matter? , 2017, Medical care.

[25]  Stefan Wager,et al.  Policy Learning With Observational Data , 2017, Econometrica.

[26]  G. Berdine Uncertainty and the welfare economics of medical care: an Austrian rebuttal Part 2 , 2017 .

[27]  Aleksey Tetenov,et al.  An economic theory of statistical testing , 2016 .

[28]  L. Casalino,et al.  US Physician Practices Spend More Than $15.4 Billion Annually To Report Quality Measures. , 2016, Health affairs.

[29]  Christos H. Papadimitriou,et al.  Strategic Classification , 2015, ITCS.

[30]  Gabriel D. Carroll Robustness and Linear Contracts , 2015 .

[31]  A. Ghaferi,et al.  Hospital Safety Scores: do grades really matter? , 2014, JAMA surgery.

[32]  W. Hwang,et al.  Hospital patient safety grades may misrepresent hospital performance. , 2014, Journal of hospital medicine.

[33]  R. Gibbons,et al.  The Handbook of Organizational Economics , 2012 .

[34]  Marlena H. Shin,et al.  Validity of selected Patient Safety Indicators: opportunities and concerns. , 2011, Journal of the American College of Surgeons.

[35]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[36]  Donald T. Campbell,et al.  Assessing the Impact of Planned Social Change* , 2010, Journal of MultiDisciplinary Evaluation.

[37]  Judea Pearl,et al.  On a Class of Bias-Amplifying Variables that Endanger Effect Estimates , 2010, UAI.

[38]  K. Hirano,et al.  Asymptotics for Statistical Treatment Rules , 2009 .

[39]  A. Ryan,et al.  The Relationship between Medicare's Process of Care Quality Measures and Mortality , 2009, Inquiry : a journal of medical care organization, provision and financing.

[40]  Jörg Stoye,et al.  Minimax regret treatment choice with finite samples , 2009 .

[41]  A. Jha,et al.  Does the Leapfrog program help identify high-quality hospitals? , 2008, Joint Commission journal on quality and patient safety.

[42]  R. Rothstein Holding Accountability to Account: How Scholarship and Experience in Other Fields Inform Exploration of Performance Incentives in Education. Working Paper 2008-04. , 2008 .

[43]  Charles F. Manski,et al.  Identification for Prediction and Decision , 2008 .

[44]  K. Mukamal The Effects of Smoking and Drinking on Cardiovascular Disease and Risk Factors , 2006, Alcohol research & health : the journal of the National Institute on Alcohol Abuse and Alcoholism.

[45]  J. Robins,et al.  Estimating causal effects from epidemiological data , 2006, Journal of Epidemiology and Community Health.

[46]  Sean M. McNee,et al.  Being accurate is not enough: how accuracy metrics have hurt recommender systems , 2006, CHI Extended Abstracts.

[47]  M. Strathern ‘Improving ratings’: audit in the British University system , 1997, European Review.

[48]  Peter Sandercock,et al.  The International Stroke Trial (IST): a randomised trial of aspirin, subcutaneous heparin, both, or neither among 19 435 patients with acute ischaemic stroke , 1997, The Lancet.

[49]  Paul R. Milgrom,et al.  Economics, Organization and Management , 1992 .

[50]  D. V. Lindley,et al.  Randomization Analysis of Experimental Data: The Fisher Randomization Test Comment , 1980 .

[51]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[52]  Jann Spiess,et al.  Optimal Estimation when Researcher and Social Preferences are Misaligned , 2017 .

[53]  Paul R. Milgrom,et al.  The economic nature of the firm: Multitask principal–agent analyses: incentive contracts, asset ownership, and job design , 2009 .

[54]  M. David The Theory of Incentives , 2006 .

[55]  Daniel P. Kessler,et al.  Is More Information Better? The Effects of “Report Cards” on Health Care Providers , 2003, Journal of Political Economy.

[56]  J. Mouritsen,et al.  Accountability : power, ethos, and the technologies of managing , 1996 .

[57]  M. Power The Audit Explosion , 1994 .

[58]  C. Goodhart Problems of Monetary Management: The UK Experience , 1984 .