Keeping Designers in the Loop: Communicating Inherent Algorithmic Trade-offs Across Multiple Objectives

Artificial intelligence algorithms have been used to enhance a wide variety of products and services, including assisting human decision making in high-stake contexts. However, these algorithms are complex and have trade-offs, notably between prediction accuracy and fairness to population subgroups. This makes it hard for designers to understand algorithms and design products or services in a way that respects users' goals, values, and needs. We proposed a method to help designers and users explore algorithms, visualize their trade-offs, and select algorithms with trade-offs consistent with their goals and needs. We evaluated our method on the problem of predicting criminal defendants' likelihood to re-offend through (i) a large-scale Amazon Mechanical Turk experiment, and (ii) in-depth interviews with domain experts. Our evaluations show that our method can help designers and users of these systems better understand and navigate algorithmic trade-offs. This paper contributes a new way of providing designers the ability to understand and control the outcomes of algorithmic systems they are creating.

[1]  Haiyi Zhu,et al.  Explaining Decision-Making Algorithms through UI: Strategies to Help Non-Expert Stakeholders , 2019, CHI.

[2]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[3]  Qian Yang,et al.  Re-examining Whether, Why, and How Human-AI Interaction Is Uniquely Difficult to Design , 2020, CHI.

[4]  N Moray,et al.  Trust, control strategies and allocation of function in human-machine systems. , 1992, Ergonomics.

[5]  Mark Craven,et al.  Rule Extraction: Where Do We Go from Here? , 1999 .

[6]  Aditya Krishna Menon,et al.  The cost of fairness in binary classification , 2018, FAT.

[7]  Aaron Halfaker,et al.  Value-Sensitive Algorithm Design , 2018, Proc. ACM Hum. Comput. Interact..

[8]  Min Kyung Lee,et al.  Procedural Justice in Algorithmic Fairness , 2019, Proc. ACM Hum. Comput. Interact..

[9]  Susan Wiedenbeck,et al.  On-line trust: concepts, evolving themes, a model , 2003, Int. J. Hum. Comput. Stud..

[10]  Seth Neel,et al.  An Empirical Study of Rich Subgroup Fairness for Machine Learning , 2018, FAT.

[11]  J. Coyne,et al.  False positives, false negatives, and the validity of the diagnosis of major depression in primary care. , 1998, Archives of family medicine.

[12]  Seth Neel,et al.  Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness , 2017, ICML.

[13]  Adam Tauman Kalai,et al.  Decoupled Classifiers for Group-Fair and Efficient Machine Learning , 2017, FAT.

[14]  Jennifer L. Skeem,et al.  Risk, Race, & Recidivism: Predictive Bias and Disparate Impact , 2016 .

[15]  Ariel D. Procaccia,et al.  WeBuildAI: Participatory Framework for Fair and Efficient Algorithmic Governance , 2018 .

[16]  P.-C.-F. Daunou,et al.  Mémoire sur les élections au scrutin , 1803 .

[17]  David Mease,et al.  Explaining the Success of AdaBoost and Random Forests as Interpolating Classifiers , 2015, J. Mach. Learn. Res..

[18]  Lauren Wilcox,et al.  "Hello AI": Uncovering the Onboarding Needs of Medical Practitioners for Human-AI Collaborative Decision-Making , 2019, Proc. ACM Hum. Comput. Interact..

[19]  Philip van Allen,et al.  Prototyping ways of prototyping AI , 2018, Interactions.

[20]  Shan Carter,et al.  Using Artificial Intelligence to Augment Human Intelligence , 2017 .

[21]  Karen Holtzblatt,et al.  Rapid Contextual Design: A How-To Guide to Key Techniques for User-Centered Design , 2004, UBIQ.

[22]  Jonathan C. Roberts,et al.  Visual comparison for information visualization , 2011, Inf. Vis..

[23]  Alexandra Chouldechova,et al.  Fair prediction with disparate impact: A study of bias in recidivism prediction instruments , 2016, Big Data.

[24]  Jure Leskovec,et al.  Interpretable Decision Sets: A Joint Framework for Description and Prediction , 2016, KDD.

[25]  Jon M. Kleinberg,et al.  On Fairness and Calibration , 2017, NIPS.

[26]  Desney S. Tan,et al.  EnsembleMatrix: interactive visualization to support machine learning with multiple classifiers , 2009, CHI.

[27]  John Zimmerman,et al.  Investigating How Experienced UX Designers Effectively Work with Machine Learning , 2018, Conference on Designing Interactive Systems.

[28]  Kenney Ng,et al.  Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models , 2016, CHI.

[29]  John Langford,et al.  A Reductions Approach to Fair Classification , 2018, ICML.

[30]  Katharina Seewald,et al.  International Perspectives on the Practical Application of Violence Risk Assessment: A Global Survey of 44 Countries , 2014 .

[31]  Christopher T. Lowenkamp,et al.  RISK, RACE, AND RECIDIVISM: PREDICTIVE BIAS AND DISPARATE IMPACT*: RISK, RACE, AND RECIDIVISM , 2016 .

[32]  Alexandra Chouldechova,et al.  A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions , 2018, FAT.

[33]  Or Biran,et al.  Explanation and Justification in Machine Learning : A Survey Or , 2017 .

[34]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[35]  Erez Shmueli,et al.  Algorithmic Fairness , 2020, ArXiv.

[36]  G. Simmel The sociology of Georg Simmel , 1950 .

[37]  John Zimmerman,et al.  Planning Adaptive Mobile Experiences When Wireframing , 2016, Conference on Designing Interactive Systems.

[38]  A. Meade,et al.  Identifying careless responses in survey data. , 2012, Psychological methods.

[39]  Lalana Kagal,et al.  Explaining Explanations: An Overview of Interpretability of Machine Learning , 2018, 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA).

[40]  COMPAS Risk Scales : Demonstrating Accuracy Equity and Predictive Parity Performance of the COMPAS Risk Scales in Broward County , 2016 .

[41]  Paul N. Bennett,et al.  Guidelines for Human-AI Interaction , 2019, CHI.

[42]  Avi Feller,et al.  Algorithmic Decision Making and the Cost of Fairness , 2017, KDD.

[43]  D. A. Andrews,et al.  The Recent Past and Near Future of Risk and/or Need Assessment , 2006 .

[44]  Hany Farid,et al.  The accuracy, fairness, and limits of predicting recidivism , 2018, Science Advances.

[45]  HoltzblattKaren,et al.  Rapid Contextual Design , 2005 .

[46]  Jon M. Kleinberg,et al.  Inherent Trade-Offs in the Fair Determination of Risk Scores , 2016, ITCS.

[47]  Mohan S. Kankanhalli,et al.  Trends and Trajectories for Explainable, Accountable and Intelligible Systems: An HCI Research Agenda , 2018, CHI.

[48]  Michael Veale,et al.  Fairness and Accountability Design Needs for Algorithmic Support in High-Stakes Public Sector Decision-Making , 2018, CHI.

[49]  Timnit Gebru,et al.  Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[50]  Julia Roberts,et al.  Measurement of information and communication technology experience and attitudes to e-learning of students in the healthcare professions: integrative review. , 2009, Journal of advanced nursing.

[51]  Kim Halskov,et al.  UX Design Innovation: Challenges for Working with Machine Learning as a Design Material , 2017, CHI.

[52]  Carlos Guestrin,et al.  Model-Agnostic Interpretability of Machine Learning , 2016, ArXiv.

[53]  James A. Landay,et al.  Development and evaluation of emerging design patterns for ubiquitous computing , 2004, DIS '04.