Auditing Fairness by Betting

We provide practical, efficient, and nonparametric methods for auditing the fairness of deployed classification and regression models. Whereas previous work relies on a fixed-sample size, our methods are sequential and allow for the continuous monitoring of incoming data, making them highly amenable to tracking the fairness of real-world systems. We also allow the data to be collected by a probabilistic policy as opposed to sampled uniformly from the population. This enables auditing to be conducted on data gathered for another purpose. Moreover, this policy may change over time and different policies may be used on different subpopulations. Finally, our methods can handle distribution shift resulting from either changes to the model or changes in the underlying population. Our approach is based on recent progress in anytime-valid inference and game-theoretic statistics-the"testing by betting"framework in particular. These connections ensure that our methods are interpretable, fast, and easy to implement. We demonstrate the efficacy of our methods on several benchmark fairness datasets.

[1]  Zachary Chase Lipton,et al.  Risk-limiting Financial Audits via Weighted Sampling without Replacement , 2023, UAI.

[2]  Aaditya Ramdas,et al.  Randomized and Exchangeable Improvements of Markov's, Chebyshev's and Chernoff's Inequalities , 2023, ArXiv.

[3]  Nikos Karampatziakis,et al.  Anytime-valid off-policy inference for contextual bandits , 2022, ACM / IMS Journal of Data Science.

[4]  G. Shafer,et al.  Game-theoretic statistics and safe anytime-valid inference , 2022, Statistical Science.

[5]  Daniel E. Ho,et al.  Entropy Regularization for Population Estimation , 2022, ArXiv.

[6]  Inioluwa Deborah Raji,et al.  Outsider Oversight: Designing a Third Party Audit Ecosystem for AI Governance , 2022, AIES.

[7]  Kristen M. Altenburger,et al.  Integrating Reward Maximization and Population Estimation: Sequential Decision-Making for Internal Revenue Service Audit Selection , 2022, AAAI.

[8]  M. Ghassemi,et al.  The medical algorithmic audit. , 2022, The Lancet. Digital health.

[9]  Wouter M. Koolen,et al.  A composite generalization of Ville's martingale theorem , 2022, Electronic Journal of Probability.

[10]  Aaditya Ramdas,et al.  Nonparametric Two-Sample Testing by Betting , 2021, IEEE Transactions on Information Theory.

[11]  Brandon R. Anderson,et al.  Beyond Ads: Sequential Decision-Making Algorithms in Law and Public Policy , 2021, CSLAW.

[12]  Solon Barocas,et al.  Algorithmic Auditing and Social Justice: Lessons from the History of Audit Studies , 2021, EAAMO.

[13]  Moritz Hardt,et al.  Retiring Adult: New Datasets for Fair Machine Learning , 2021, NeurIPS.

[14]  Daniel E. Ho,et al.  Evaluation of Allocation Schemes of COVID-19 Testing Resources in a Community-Based Door-to-Door Testing Program , 2021, JAMA health forum.

[15]  Philip B. Stark,et al.  RiLACS: Risk Limiting Audits via Confidence Sequences , 2021, E-VOTE-ID.

[16]  Karthyek Murthy,et al.  Testing Group Fairness via Optimal Transport Projections , 2021, ICML.

[17]  Glenn Shafer,et al.  Author's reply to the Discussion of ‘Testing by betting: A strategy for statistical and scientific communication’ by Glenn Shafer , 2021, Journal of the Royal Statistical Society: Series A (Statistics in Society).

[18]  Shira Mitchell,et al.  Algorithmic Fairness: Choices, Assumptions, and Definitions , 2021, Annual Review of Statistics and Its Application.

[19]  Jack Bandy,et al.  Problematic Machine Behavior , 2021, Proc. ACM Hum. Comput. Interact..

[20]  Viet Anh Nguyen,et al.  A Statistical Test for Probabilistic Fairness , 2020, FAccT.

[21]  Aaditya Ramdas,et al.  Estimating means of bounded random variables by betting , 2020, Journal of the Royal Statistical Society Series B: Statistical Methodology.

[22]  L. Wasserman,et al.  Interactive rank testing by betting , 2020, CLeaR.

[23]  Wouter M. Koolen,et al.  Admissible anytime-valid sequential inference must rely on nonnegative martingales. , 2020, 2009.03167.

[24]  Muhammad Aurangzeb Ahmad,et al.  Fairness in Machine Learning for Healthcare , 2020, KDD.

[25]  D. Angus,et al.  A Proposed Lottery System to Allocate Scarce COVID-19 Medications: Promoting Fairness and Generating Knowledge. , 2020, JAMA.

[26]  Francesco Orabona,et al.  Better Parameter-free Stochastic Optimization with ODE Updates for Coin-Betting , 2020, AAAI.

[27]  L. Wasserman,et al.  Minimax optimality of permutation tests , 2020, The Annals of Statistics.

[28]  Erez Shmueli,et al.  Algorithmic Fairness , 2020, ArXiv.

[29]  Inioluwa Deborah Raji,et al.  Closing the AI accountability gap: defining an end-to-end framework for internal algorithmic auditing , 2020, FAT*.

[30]  Deborah Hellman,et al.  Measuring Algorithmic Fairness , 2019 .

[31]  Wouter M. Koolen,et al.  Safe Testing , 2019, 2020 Information Theory and Applications Workshop (ITA).

[32]  Vladimir Vovk,et al.  Game‐Theoretic Foundations for Probability and Finance , 2019, Wiley Series in Probability and Statistics.

[33]  Francesco Orabona,et al.  Parameter-free Online Convex Optimization with Sub-Exponential Noise , 2019, COLT.

[34]  Inioluwa Deborah Raji,et al.  Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products , 2019, AIES.

[35]  Marcus A. Badgeley,et al.  Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study , 2018, PLoS medicine.

[36]  Pietro Perona,et al.  Recognition in Terra Incognita , 2018, ECCV.

[37]  Allison Woodruff,et al.  A Qualitative Exploration of Perceptions of Algorithmic Fairness , 2018, CHI.

[38]  Francesco Orabona,et al.  Black-Box Reductions for Parameter-free Online Learning in Banach Spaces , 2018, COLT.

[39]  Avi Feller,et al.  Algorithmic Decision Making and the Cost of Fairness , 2017, KDD.

[40]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[41]  Francesco Orabona,et al.  Coin Betting and Parameter-Free Online Learning , 2016, NIPS.

[42]  I. Žliobaitė,et al.  Quantifying explainable discrimination and removing illegal discrimination in automated decision making , 2013, Knowledge and Information Systems.

[43]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[44]  I-Cheng Yeh,et al.  The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients , 2009, Expert Syst. Appl..

[45]  P. Stark Conservative statistical post-election audits , 2008, 0807.4005.

[46]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[47]  G. Shafer,et al.  Probability and Finance: It's Only a Game! , 2001 .

[48]  T. Lai On Confidence Sequences , 1976 .

[49]  H. Robbins,et al.  Confidence sequences for mean, variance, and median. , 1967, Proceedings of the National Academy of Sciences of the United States of America.

[50]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[51]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[52]  J. Wolfowitz,et al.  Optimum Character of the Sequential Probability Ratio Test , 1948 .

[53]  Karrie Karahalios,et al.  Auditing Algorithms: Understanding Algorithmic Systems from the Outside In , 2021, Found. Trends Hum. Comput. Interact..

[54]  J. Andel Sequential Analysis , 2022, The SAGE Encyclopedia of Research Design.

[55]  Jean-Luc Ville Étude critique de la notion de collectif , 1939 .