Fair inference on error-prone outcomes

Fair inference in supervised learning is an important and active area of research, yielding a range of useful methods to assess and account for fairness criteria when predicting ground truth targets. As shown in recent work, however, when target labels are error-prone, potential prediction unfairness can arise from measurement error. In this paper, we show that, when an error-prone proxy target is used, existing methods to assess and calibrate fairness criteria do not extend to the true target variable of interest. To remedy this problem, we suggest a framework resulting from the combination of two existing literatures: fair ML methods, such as those found in the counterfactual fairness literature on the one hand, and, on the other, measurement models found in the statistical literature. We discuss these approaches and their connection resulting in our framework. In a healthcare decision problem, we find that using a latent variable model to account for measurement error removes the unfairness detected previously.

[1]  Stan Lipovetsky,et al.  Generalized Latent Variable Modeling: Multilevel,Longitudinal, and Structural Equation Models , 2005, Technometrics.

[2]  Alexander Kukush,et al.  Measurement Error Models , 2011, International Encyclopedia of Statistical Science.

[3]  Bernhard Schölkopf,et al.  Avoiding Discrimination through Causal Reasoning , 2017, NIPS.

[4]  Maarten van Smeden,et al.  Measurement error is often neglected in medical literature: a systematic review. , 2018, Journal of clinical epidemiology.

[5]  Denny Borsboom,et al.  When does measurement invariance matter? , 2006, Medical care.

[6]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[7]  Avi Feller,et al.  Algorithmic Decision Making and the Cost of Fairness , 2017, KDD.

[8]  Melissa S. Yale,et al.  Differential Item Functioning , 2014 .

[9]  J. S. Long,et al.  Testing Structural Equation Models , 1993 .

[10]  M. Kearns,et al.  Fairness in Criminal Justice Risk Assessments: The State of the Art , 2017, Sociological Methods & Research.

[11]  H. Wainer,et al.  Differential Item Functioning. , 1994 .

[12]  D. Borsboom Latent Variable Theory , 2008 .

[13]  Matt J. Kusner,et al.  Counterfactual Fairness , 2017, NIPS.

[14]  Jon M. Kleinberg,et al.  Inherent Trade-Offs in the Fair Determination of Risk Scores , 2016, ITCS.

[15]  Georg Rasch,et al.  Probabilistic Models for Some Intelligence and Attainment Tests , 1981, The SAGE Encyclopedia of Research Design.

[16]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[17]  F. Lord Applications of Item Response Theory To Practical Testing Problems , 1980 .

[18]  N. Schmitt,et al.  Measurement invariance: Review of practice and implications , 2008 .

[19]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[20]  Paulette Flore Stereotype threat and differential item functioning: A critical assessment , 2018 .

[21]  Ilya Shpitser,et al.  Fair Inference on Outcomes , 2017, AAAI.

[22]  Abigail Z. Jacobs,et al.  Measurement and Fairness , 2019, FAccT.

[23]  Gideon J. Mellenbergh,et al.  Item bias and item response theory , 1989 .

[24]  J. Steenkamp,et al.  Assessing Measurement Invariance in Cross-National Consumer Research , 1998 .

[25]  V. Neuhaus,et al.  Latent Class Analysis , 2010 .

[26]  R. Vandenberg,et al.  A Review and Synthesis of the Measurement Invariance Literature: Suggestions, Practices, and Recommendations for Organizational Research , 2000 .

[27]  J. B. Pearson,et al.  Methodology in Social Research. , 1968 .

[28]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[29]  Julia Rubin,et al.  Fairness Definitions Explained , 2018, 2018 IEEE/ACM International Workshop on Software Fairness (FairWare).

[30]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[31]  Mandy Eberhart Methodology In Social Research , 2016 .

[32]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[33]  Brian W. Powers,et al.  Dissecting racial bias in an algorithm used to manage the health of populations , 2019, Science.