Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need?

The potential for machine learning (ML) systems to amplify social inequities and unfairness is receiving increasing popular and academic attention. A surge of recent work has focused on the development of algorithmic tools to assess and mitigate such unfairness. If these tools are to have a positive impact on industry practice, however, it is crucial that their design be informed by an understanding of real-world needs. Through 35 semi-structured interviews and an anonymous survey of 267 ML practitioners, we conduct the first systematic investigation of commercial product teams' challenges and needs for support in developing fairer ML systems. We identify areas of alignment and disconnect between the challenges faced by teams in practice and the solutions proposed in the fair ML research literature. Based on these findings, we highlight directions for future ML and HCI research that will better address practitioners' needs.

[1]  Helen Nissenbaum,et al.  Bias in computer systems , 1996, TOIS.

[2]  D. E. Stokes Pasteur's Quadrant: Basic Science and Technological Innovation , 1997 .

[3]  Hugo Liu,et al.  Makebelieve: using commonsense knowledge to generate stories , 2002, AAAI/IAAI.

[4]  David S. Janzen,et al.  Test-driven development concepts, taxonomy, and future direction , 2005, Computer.

[5]  Ryan Shaun Joazeiro de Baker,et al.  Developing a generalizable detector of when students game the system , 2008, User Modeling and User-Adapted Interaction.

[6]  Chris Franke Family Educational Rights and Privacy Act (FERPA) , 2007, Journal of empirical research on human research ethics : JERHRE.

[7]  Panagiotis G. Ipeirotis,et al.  Beat the Machine: Challenging Workers to Find the Unknown Unknowns , 2011, Human Computation.

[8]  Bella Martin,et al.  Universal Methods of Design: 100 Ways to Research Complex Problems, Develop Innovative Ideas, and Design Effective Solutions , 2012 .

[9]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[10]  Todd Kulesza,et al.  Structured labeling for facilitating concept evolution in machine learning , 2014, CHI.

[11]  Brian D. Ziebart,et al.  Shift-Pessimistic Active Learning Using Robust Bias-Aware Prediction , 2015, AAAI.

[12]  D. Sculley,et al.  Hidden Technical Debt in Machine Learning Systems , 2015, NIPS.

[13]  Rebecca Gray,et al.  Understanding User Beliefs About Algorithmic Curation in the Facebook News Feed , 2015, CHI.

[14]  Weng-Keen Wong,et al.  Principles of Explanatory Debugging to Personalize Interactive Machine Learning , 2015, IUI.

[15]  David Maxwell Chickering,et al.  ModelTracker: Redesigning Performance Analysis Tools for Machine Learning , 2015, CHI.

[16]  Sean A. Munson,et al.  Unequal Representation and Gender Stereotypes in Image Search Results for Occupations , 2015, CHI.

[17]  Eric Horvitz,et al.  Identifying and Accounting for Task-Dependent Bias in Crowdsourcing , 2015, HCOMP.

[18]  Andrew D. Selbst,et al.  Big Data's Disparate Impact , 2016 .

[19]  Kenneth R. Koedinger,et al.  The Apprentice Learner architecture: Closing the loop between learning theory and educational data , 2016, EDM.

[20]  K. Lum,et al.  To predict and serve? , 2016 .

[21]  Ece Kamar,et al.  Directions in Hybrid Intelligence: Complementing AI Systems with Human Intelligence , 2016, IJCAI.

[22]  Ryan Shaun Joazeiro de Baker,et al.  Detecting Student Emotions in Computer-Enabled Classrooms , 2016, IJCAI.

[23]  Bin Xu,et al.  Personality-targeted Gamification: A Survey Study on Personality Traits and Motivational Affordances , 2016, CHI.

[24]  Christopher T. Lowenkamp,et al.  False Positives, False Negatives, and False Analyses: A Rejoinder to "Machine Bias: There's Software Used across the Country to Predict Future Criminals. and It's Biased against Blacks" , 2016 .

[25]  J. Malouff,et al.  Bias in grading: A meta-analysis of experimental research findings , 2016 .

[26]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[27]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[28]  John Zimmerman,et al.  Investigating the Heart Pump Implant Decision Process: Opportunities for Decision Support Tools to Help , 2016, CHI.

[29]  Matt J. Kusner,et al.  Counterfactual Fairness , 2017, NIPS.

[30]  Ece Kamar,et al.  Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets , 2017, CHI.

[31]  Alexandra Chouldechova,et al.  Fair prediction with disparate impact: A study of bias in recidivism prediction instruments , 2016, Big Data.

[32]  Michael Veale,et al.  Fairer machine learning in the real world: Mitigating discrimination without collecting sensitive data , 2017, Big Data Soc..

[33]  Yuriy Brun,et al.  Fairness testing: testing software for discrimination , 2017, ESEC/SIGSOFT FSE.

[34]  Eric Horvitz,et al.  On Human Intellect and Machine Failures: Troubleshooting Integrative Machine Learning Systems , 2016, AAAI.

[35]  Hal Hodson,et al.  Google DeepMind and healthcare in an age of algorithms , 2017, Health and Technology.

[36]  Michael S. Bernstein,et al.  Flash Organizations: Crowdsourcing Complex Work by Structuring Crowds As Organizations , 2017, CHI.

[37]  Vincent Aleven,et al.  Intelligent tutors as teachers' aides: exploring teacher needs for real-time analytics in blended classrooms , 2017, LAK.

[38]  Eric Horvitz,et al.  Identifying Unknown Unknowns in the Open World: Representations and Policies for Guided Exploration , 2016, AAAI.

[39]  V. Wortman Making better use of the crowd , 2017 .

[40]  Holtzblatt Karen,et al.  Contextual Inquiry: A Participatory Technique for System Design , 2017 .

[41]  Jonathan Krause,et al.  Scalable Annotation of Fine-Grained Categories Without Experts , 2017, CHI.

[42]  Kim Halskov,et al.  UX Design Innovation: Challenges for Working with Machine Learning as a Design Material , 2017, CHI.

[43]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[44]  Taina Bucher,et al.  The algorithmic imaginary: exploring the ordinary affects of Facebook algorithms , 2017, The Social Power of Algorithms.

[45]  Min Kyung Lee Algorithmic Mediation in Group Decisions: Fairness Perceptions of Algorithmically Mediated vs. Discussion-Based Social Division , 2017, CSCW.

[46]  Jon M. Kleinberg,et al.  Inherent Trade-Offs in the Fair Determination of Risk Scores , 2016, ITCS.

[47]  Jeffrey Heer,et al.  The partnership on AI , 2018, SIGAI.

[48]  Yuriy Brun,et al.  Themis: automatically testing software for discrimination , 2018, ESEC/SIGSOFT FSE.

[49]  Kentaro Toyama,et al.  From needs to aspirations in information technology for development , 2018, Inf. Technol. Dev..

[50]  Evelin Amorim,et al.  Automated Essay Scoring in the Presence of Biased Ratings , 2018, NAACL.

[51]  Timnit Gebru,et al.  Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[52]  S. Noble Algorithms of Oppression: How Search Engines Reinforce Racism , 2018 .

[53]  John Langford,et al.  A Reductions Approach to Fair Classification , 2018, ICML.

[54]  Justine Cassell,et al.  A User Simulator Architecture for Socially-Aware Conversational Agents , 2018, IVA.

[55]  Ben Green,et al.  The Myth in the Methodology: Towards a Recontextualization of Fairness in Machine Learning , 2018, ICML 2018.

[56]  Reuben Binns,et al.  Fairness in Machine Learning: Lessons from Political Philosophy , 2017, FAT.

[57]  Qian Yang,et al.  Machine Learning as a UX Design Material: How Can We Imagine Beyond Automation, Recommenders, and Reminders? , 2018, AAAI Spring Symposia.

[58]  Allison Woodruff,et al.  A Qualitative Exploration of Perceptions of Algorithmic Fairness , 2018, CHI.

[59]  Brad A. Myers,et al.  The Story in the Notebook: Exploratory Data Science using a Literate Programming Tool , 2018, CHI.

[60]  Justine Cassell,et al.  Socially-Conditioned Task Reasoning for a Virtual Tutoring Agent , 2018, AAMAS.

[61]  Rayid Ghani,et al.  Aequitas: A Bias and Fairness Audit Toolkit , 2018, ArXiv.

[62]  Esther Rolf,et al.  Delayed Impact of Fair Machine Learning , 2018, ICML.

[63]  Alexandra Chouldechova,et al.  A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions , 2018, FAT.

[64]  Henriette Cramer,et al.  Assessing and Addressing Algorithmic Bias - But Before We Get There , 2018, AAAI Spring Symposia.

[65]  Mehmed M. Kantardzic,et al.  Sloppiness mitigation in crowdsourcing: detecting and correcting bias for crowd scoring tasks , 2018, International Journal of Data Science and Analytics.

[66]  Kadija Ferryman,et al.  Fairness in precision medicine , 2018 .

[67]  Emily M. Bender,et al.  Data Statements for NLP: Toward Mitigating System Bias and Enabling Better Science , 2018 .

[68]  Qian Yang,et al.  Grounding Interactive Machine Learning Tool Design in How Non-Experts Actually Build Models , 2018, Conference on Designing Interactive Systems.

[69]  Emily M. Bender,et al.  Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science , 2018, TACL.

[70]  M. Kearns,et al.  Fairness in Criminal Justice Risk Assessments: The State of the Art , 2017, Sociological Methods & Research.

[71]  Kori Inkpen Quinn,et al.  Investigating Human + Machine Complementarity for Recidivism Predictions , 2018, ArXiv.

[72]  Jun Zhao,et al.  'It's Reducing a Human Being to a Percentage': Perceptions of Justice in Algorithmic Decisions , 2018, CHI.

[73]  Jeanna Neefe Matthews,et al.  Algorithmic accountability: a primer , 2018 .

[74]  Krishna P. Gummadi,et al.  Blind Justice: Fairness with Encrypted Sensitive Attributes , 2018, ICML.

[75]  Zhiwei Steven Wu,et al.  The Externalities of Exploration and How Data Diversity Helps Exploitation , 2018, COLT.

[76]  Nathan Kallus,et al.  Residual Unfairness in Fair Machine Learning from Prejudiced Data , 2018, ICML.

[77]  Annika Wærn,et al.  Towards Algorithmic Experience: Initial Efforts for Social Media Contexts , 2018, CHI.

[78]  Michael Veale,et al.  Fairness and Accountability Design Needs for Algorithmic Support in High-Stakes Public Sector Decision-Making , 2018, CHI.

[79]  Patrice Y. Simard,et al.  AnchorViz: Facilitating Classifier Error Discovery through Interactive Semantic Data Exploration , 2018, IUI.

[80]  Vincent Aleven,et al.  The classroom as a dashboard: co-designing wearable cognitive augmentation for K-12 teachers , 2018, LAK.

[81]  Alex S. Taylor,et al.  Let's Talk About Race: Identity, Chatbots, and AI , 2018, CHI.

[82]  Min Kyung Lee Understanding perception of algorithmic decisions: Fairness, trust, and emotion in response to algorithmic management , 2018, Big Data Soc..

[83]  Vincent Aleven,et al.  Student Learning Benefits of a Mixed-Reality Teacher Awareness Tool in AI-Enhanced Classrooms , 2018, AIED.

[84]  David Sontag,et al.  Why Is My Classifier Discriminatory? , 2018, NeurIPS.

[85]  Anne Marie Piper,et al.  Addressing Age-Related Bias in Sentiment Analysis , 2018, CHI.

[86]  Morgan Klaus Scheuerman,et al.  Gender Recognition or Gender Reductionism?: The Social Implications of Embedded Gender Recognition Systems , 2018, CHI.

[87]  Cynthia Dwork,et al.  Fairness Under Composition , 2018, ITCS.

[88]  Danah Boyd,et al.  Fairness and Abstraction in Sociotechnical Systems , 2019, FAT.

[89]  Henriette Cramer,et al.  Translation, Tracks & Data: an Algorithmic Bias Effort in Practice , 2019, CHI Extended Abstracts.

[90]  Timnit Gebru,et al.  Datasheets for datasets , 2018, Commun. ACM.