Model Multiplicity: Opportunities, Concerns, and Solutions

Recent scholarship has brought attention to the fact that there often exist multiple models for a given prediction task with equal accuracy that differ in their individual-level predictions or aggregate properties. This phenomenon—which we call model multiplicity—can introduce a good deal of flexibility into the model selection process, creating a range of exciting opportunities. By demonstrating that there are many different ways of making equally accurate predictions, multiplicity gives model developers the freedom to prioritize other values in their model selection process without having to abandon their commitment to maximizing accuracy. However, multiplicity also brings to light a concerning truth: model selection on the basis of accuracy alone—the default procedure in many deployment scenarios—fails to consider what might be meaningful differences between equally accurate models with respect to other criteria such as fairness, robustness, and interpretability. Unless these criteria are taken into account explicitly, developers might end up making unnecessary trade-offs or could even mask intentional discrimination. Furthermore, the prospect that there might exist another model of equal accuracy that flips a prediction for a particular individual may lead to a crisis in justifiability: why should an individual be subject to an adverse model outcome if there exists an equally accurate model that treats them more favorably? In this work, we investigate how to take advantage of the flexibility afforded by model multiplicity while addressing the concerns with justifiability that it might raise?

[1]  Matt Fredrikson,et al.  Selective Ensembles for Consistent Predictions , 2021, ICLR.

[2]  Matt Fredrikson,et al.  Consistent Counterfactuals for Deep Models , 2021, ICLR.

[3]  William Agnew,et al.  The Values Encoded in Machine Learning Research , 2021, FAccT.

[4]  Kathleen Creel,et al.  The Algorithmic Leviathan: Arbitrariness, Fairness, and Opportunity in Algorithmic Decision Making Systems , 2021, Canadian Journal of Philosophy.

[5]  Marcin Detyniecki,et al.  Understanding Prediction Discrepancies in Machine Learning Classifiers , 2021, ArXiv.

[6]  Matt Fredrikson,et al.  Leave-one-out Unfairness , 2021, FAccT.

[7]  A. Feder Cooper,et al.  Emergent Unfairness in Algorithmic Fairness-Accuracy Trade-Off Research , 2021, AIES.

[8]  Jon Kleinberg,et al.  Algorithmic monoculture and social welfare , 2021, Proceedings of the National Academy of Sciences.

[9]  Alexandra Chouldechova,et al.  Characterizing Fairness Over the Set of Good Models Under Selective Labels , 2021, ICML.

[10]  Rayid Ghani,et al.  Empirical observation of negligible fairness–accuracy trade-offs in machine learning for public policy , 2020, Nature Machine Intelligence.

[11]  Alexander D'Amour,et al.  Underspecification Presents Challenges for Credibility in Modern Machine Learning , 2020, J. Mach. Learn. Res..

[12]  Bernhard Schölkopf,et al.  A survey of algorithmic recourse: definitions, formulations, solutions, and prospects , 2020, ArXiv.

[13]  Klaus-Robert Müller,et al.  Fairwashing Explanations with Off-Manifold Detergent , 2020, ICML.

[14]  Kush R. Varshney,et al.  Is There a Trade-Off Between Fairness and Accuracy? A Perspective Using Mismatched Hypothesis Testing , 2020, ICML.

[15]  Gjergji Kasneci,et al.  On Counterfactual Explanations under Predictive Multiplicity , 2020, UAI.

[16]  Rayid Ghani,et al.  Case study: predictive fairness to reduce misdemeanor recidivism through social service interventions , 2020, FAT*.

[17]  Nikolaus Kriegeskorte,et al.  Individual differences among deep neural network models , 2020, Nature Communications.

[18]  Solon Barocas,et al.  The hidden assumptions behind counterfactual explanations and principal reasons , 2019, FAT*.

[19]  F. Calmon,et al.  Predictive Multiplicity in Classification , 2019, ICML.

[20]  Solon Barocas,et al.  Mitigating Bias in Algorithmic Employment Screening: Evaluating Claims and Practices , 2019, SSRN Electronic Journal.

[21]  Brian W. Powers,et al.  Dissecting racial bias in an algorithm used to manage the health of populations , 2019, Science.

[22]  Cynthia Rudin,et al.  A study in Rashomon curves and volumes: A new perspective on generalization and model simplicity in machine learning , 2019, ArXiv.

[23]  Patrick Jaillet,et al.  The Price of Interpretability , 2019, ArXiv.

[24]  Tong Wang,et al.  Gaining Free or Low-Cost Interpretability with Interpretable Partial Substitute , 2019, ICML.

[25]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[26]  Cynthia Rudin,et al.  Variable Importance Clouds: A Way to Explore Variable Importance for the Set of Good Models , 2019, ArXiv.

[27]  Solon Barocas,et al.  Problem Formulation and Fairness , 2019, FAT.

[28]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[29]  Yang Liu,et al.  Actionable Recourse in Linear Classification , 2018, FAT.

[30]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[31]  Aaron J. Fisher,et al.  All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously , 2018, J. Mach. Learn. Res..

[32]  Jean-Baptiste Tristan,et al.  Unlocking Fairness: a Trade-off Revisited , 2019, NeurIPS.

[33]  Cynthia Rudin,et al.  An Interpretable Model with Globally Consistent Explanations for Credit Risk , 2018, ArXiv.

[34]  David Sontag,et al.  Why Is My Classifier Discriminatory? , 2018, NeurIPS.

[35]  L. Bonnett,et al.  May the odds be ever in your favour , 2018, Teaching statistics.

[36]  John Langford,et al.  A Reductions Approach to Fair Classification , 2018, ICML.

[37]  Solon Barocas,et al.  The Intuitive Appeal of Explainable Machines , 2018 .

[38]  Aditya Krishna Menon,et al.  The cost of fairness in binary classification , 2018, FAT.

[39]  Chris Russell,et al.  Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.

[40]  D. Donoho 50 Years of Data Science , 2017 .

[41]  Matt Fredrikson,et al.  Use Privacy in Data-Driven Systems: Theory and Experiments with Machine Learnt Programs , 2017, CCS.

[42]  Matt Fredrikson,et al.  Proxy Discrimination∗ in Data-Driven Systems Theory and Experiments with Machine Learnt Programs , 2017 .

[43]  Andrew D. Selbst,et al.  Big Data's Disparate Impact , 2016 .

[44]  Michael L. Rich Machine Learning, Automated Suspicion Algorithms, and the Fourth Amendment , 2015 .

[45]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[46]  Frank A. Pasquale,et al.  [89WashLRev0001] The Scored Society: Due Process for Automated Predictions , 2014 .

[47]  Tal Z. Zarsky,et al.  'May the Odds Be Ever in Your Favor': Lotteries in Law , 2014 .

[48]  K. Crawford,et al.  Big Data and Due Process: Toward a Framework to Redress Predictive Privacy Harms , 2013 .

[49]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[50]  D. Citron Technological Due Process , 2007 .

[51]  L. Bressman Beyond Accountability: Arbitrariness and Legitimacy in the Administrative State , 2004 .

[52]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[53]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[54]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[55]  Pedro M. Domingos A Unified Bias-Variance Decomposition , 2022 .