AI auditing: The Broken Bus on the Road to AI Accountability

One of the most concrete measures to take towards meaningful AI accountability is to consequentially assess and report the systems' performance and impact. However, the practical nature of the"AI audit"ecosystem is muddled and imprecise, making it difficult to work through various concepts and map out the stakeholders involved in the practice. First, we taxonomize current AI audit practices as completed by regulators, law firms, civil society, journalism, academia, consulting agencies. Next, we assess the impact of audits done by stakeholders within each domain. We find that only a subset of AI audit studies translate to desired accountability outcomes. We thus assess and isolate practices necessary for effective AI audit results, articulating the observed connections between AI audit design, methodology and institutional context on its effectiveness as a meaningful mechanism for accountability.

[1]  William S. Isaac,et al.  Sociotechnical Safety Evaluation of Generative AI Systems , 2023, ArXiv.

[2]  Benjamin Laufer,et al.  Strategic Evaluation , 2023, EAAMO.

[3]  Susan Leavy,et al.  Gender Bias in Multimodal Models: A Transnational Feminist Approach Considering Geographical Region and Culture , 2023, AEQUITAS@ECAI.

[4]  Colin H. Kalicki,et al.  Sociotechnical Audits: Broadening the Algorithm Auditing Lens to Investigate Targeted Advertising , 2023, Proc. ACM Hum. Comput. Interact..

[5]  Pamela Samuelson Generative AI meets copyright , 2023, Science.

[6]  Laura A. Dabbish,et al.  It’s about power: What ethical concerns do software engineers have, and what do they (feel they can) do about them? , 2023, FAccT.

[7]  Q. Liao,et al.  Towards a Science of Human-AI Decision Making: An Overview of Design Space in Empirical Human-Subject Studies , 2023, FAccT.

[8]  Gina Neff,et al.  A Sociotechnical Audit: Assessing Police Use of Facial Recognition , 2023, FAccT.

[9]  Su Lin Blodgett,et al.  Evaluating the Social Impact of Generative AI Systems in Systems and Society , 2023, ArXiv.

[10]  B. Rakova,et al.  Algorithms as Social-Ecological-Technological Systems: an Environmental Justice Lens on Algorithmic Audits , 2023, FAccT.

[11]  Sharad Goel,et al.  Popular Support for Balancing Equity and Efficiency in Resource Allocation: A Case Study in Online Advertising to Increase Welfare Program Awareness , 2023, ICWSM.

[12]  Yacine Jernite,et al.  Stable Bias: Analyzing Societal Representations in Diffusion Models , 2023, ArXiv.

[13]  Naman Goyal,et al.  LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.

[14]  Florian Tramèr,et al.  Extracting Training Data from Diffusion Models , 2023, USENIX Security Symposium.

[15]  George P. Sieniawski,et al.  Saisiyat Is Where It Is At! Insights Into Backdoors And Debiasing Of Cross Lingual Transformers For Named Entity Recognition , 2022, 2022 IEEE International Conference on Big Data (Big Data).

[16]  Inioluwa Deborah Raji,et al.  Who Audits the Auditors? Recommendations from a field scan of the algorithmic auditing ecosystem , 2022, FAccT.

[17]  Ben Gansky,et al.  CounterFAccTual: How FAccT Undermines Its Organizing Principles , 2022, FAccT.

[18]  Bianca A. Lepe,et al.  Tech Worker Organizing for Power and Accountability , 2022, FAccT.

[19]  A. Chouldechova,et al.  Algorithmic Fairness and Vertical Equity: Income Fairness with IRS Tax Audit Models , 2022, FAccT.

[20]  Inioluwa Deborah Raji,et al.  The Fallacy of AI Functionality , 2022, FAccT.

[21]  Inioluwa Deborah Raji,et al.  Outsider Oversight: Designing a Third Party Audit Ecosystem for AI Governance , 2022, AIES.

[22]  Zhiwei Steven Wu,et al.  Imagining new futures beyond predictive systems in child welfare: A qualitative study with impacted stakeholders , 2022, FAccT.

[23]  Nithya Sambasivan,et al.  How Platform-User Power Relations Shape Algorithmic Accountability: A Case Study of Instant Loan Platforms and Financially Stressed Users in India , 2022, FAccT.

[24]  Kalypso Iordanou,et al.  An AI ethics ‘David and Goliath’: value conflicts between large tech companies and their employees , 2022, AI & SOCIETY.

[25]  Muhammed Can Atlas of AI: power, politics, and the planetary costs of artificial intelligence. , 2022, International Affairs.

[26]  Charlene H. Chu,et al.  Digital Ageism: Challenges and Opportunities in Artificial Intelligence for Older Adults , 2022, The Gerontologist.

[27]  D. Kuch,et al.  Economies of Virtue: The Circulation of ‘Ethics’ in Big Tech , 2021, Science as Culture.

[28]  Vinay Uday Prabhu,et al.  Multimodal datasets: misogyny, pornography, and malignant stereotypes , 2021, ArXiv.

[29]  Solon Barocas,et al.  Algorithmic Auditing and Social Justice: Lessons from the History of Audit Studies , 2021, EAAMO.

[30]  Ben Green,et al.  Escaping the 'Impossibility of Fairness': From Formal to Substantive Algorithmic Fairness , 2021, SSRN Electronic Journal.

[31]  William Agnew,et al.  The Values Encoded in Machine Learning Research , 2021, FAccT.

[32]  E. Moss,et al.  A Silicon Valley love triangle: Hiring algorithms, pseudo-science, and the quest for auditability , 2021, Patterns.

[33]  Sarah L. Desmarais,et al.  It's COMPASlicated: The Messy Relationship between RAI Datasets and Algorithmic Fairness Benchmarks , 2021, NeurIPS Datasets and Benchmarks.

[34]  Aleksandra Korolova,et al.  Auditing for Discrimination in Algorithms Delivering Job Ads , 2021, WWW.

[35]  Christo Wilson,et al.  Building and Auditing Fair Algorithms: A Case Study in Candidate Screening , 2021, FAccT.

[36]  Dharma Dailey,et al.  An Action-Oriented AI Policy Toolkit for Technology Audits by Community Advocates and Activists , 2021, FAccT.

[37]  Jack Bandy,et al.  Problematic Machine Behavior , 2021, Proc. ACM Hum. Comput. Interact..

[38]  James Zou,et al.  Persistent Anti-Muslim Bias in Large Language Models , 2021, AIES.

[39]  Shea Brown,et al.  The algorithm audit: Scoring the algorithms that score us , 2021, Big Data Soc..

[40]  Patrick Grother,et al.  Ongoing Face Recognition Vendor Test (FRVT) Part 6B: Face recognition accuracy with face masks using post-COVID-19 algorithms , 2020 .

[41]  Sean McGregor,et al.  Preventing Repeated Real World AI Failures by Cataloging Incidents: The AI Incident Database , 2020, AAAI.

[42]  Aylin Caliskan,et al.  Image Representations Learned With Unsupervised Pre-Training Contain Human-like Biases , 2020, FAccT.

[43]  M. C. Elish,et al.  Algorithmic Impact Assessments and Accountability: The Co-construction of Impacts , 2020, FAccT.

[44]  Aaron Y. Lee,et al.  Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension , 2020, Nature Medicine.

[45]  Alice Xiang,et al.  Reconciling Legal and Technical Approaches to Algorithmic Bias , 2020 .

[46]  Pratyusha Kalluri,et al.  Don’t ask if artificial intelligence is good or fair, ask how it shifts power , 2020, Nature.

[47]  Vinay Uday Prabhu,et al.  Large image datasets: A pyrrhic win for computer vision? , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[48]  Dan Jurafsky,et al.  Racial disparities in automated speech recognition , 2020, Proceedings of the National Academy of Sciences.

[49]  MICROWAVE , 2020, Household Horror.

[50]  Maranke Wieringa,et al.  What to account for when accounting for algorithms: a systematic literature review on algorithmic accountability , 2020, FAT*.

[51]  Inioluwa Deborah Raji,et al.  Closing the AI accountability gap: defining an end-to-end framework for internal algorithmic auditing , 2020, FAT*.

[52]  Inioluwa Deborah Raji,et al.  Saving Face: Investigating the Ethical Concerns of Facial Recognition Auditing , 2020, AIES.

[53]  Piotr Sapiezynski,et al.  Algorithms that "Don't See Color": Comparing Biases in Lookalike and Special Ad Audiences , 2019, ArXiv.

[54]  Patrick J. Grother,et al.  Face recognition vendor test part 3: , 2019 .

[55]  Inioluwa Deborah Raji,et al.  On the Legal Compatibility of Fairness Definitions , 2019, ArXiv.

[56]  Anna Jobin,et al.  The global landscape of AI ethics guidelines , 2019, Nature Machine Intelligence.

[57]  Noah A. Smith,et al.  Green AI , 2019, 1907.10597.

[58]  Christo Wilson,et al.  Bias Misperceived: The Role of Partisanship and Misinformation in YouTube Comment Moderation , 2019, ICWSM.

[59]  A. Chouldechova,et al.  Toward Algorithmic Accountability in Public Services: A Qualitative Study of Affected Community Perspectives on Algorithmic Decision-making in Child Welfare Services , 2019, CHI.

[60]  Kinga Polynczuk-Alenius,et al.  Algorithms of oppression: how search engines reinforce racism , 2019, Information, Communication & Society.

[61]  Danah Boyd,et al.  Fairness and Abstraction in Sociotechnical Systems , 2019, FAT.

[62]  Sendhil Mullainathan,et al.  Dissecting Racial Bias in an Algorithm that Guides Health Decisions for 70 Million People , 2019, FAT.

[63]  J. Guttag,et al.  A Framework for Understanding Sources of Harm throughout the Machine Learning Life Cycle , 2019, EAAMO.

[64]  Inioluwa Deborah Raji,et al.  Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products , 2019, AIES.

[65]  Clients , 2018, Routledge International Handbook of Sex Industry Research.

[66]  Hannah Lebovits Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor , 2018, Public Integrity.

[67]  Ido Kilovaty Legally Cognizable Manipulation , 2018 .

[68]  M. R. Kerbel What About Us? , 2018, Remote & Controlled.

[69]  Allison Woodruff,et al.  A Qualitative Exploration of Perceptions of Algorithmic Fairness , 2018, CHI.

[70]  Timnit Gebru,et al.  Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[71]  Alexandra Chouldechova,et al.  A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions , 2018, FAT.

[72]  Katherine Fink,et al.  Opening the government’s black boxes: freedom of information and algorithmic accountability , 2017 .

[73]  C. Halberstadt What We Are. , 2017, JAMA.

[74]  K. Lum,et al.  To predict and serve? , 2016 .

[75]  Christo Wilson,et al.  An Empirical Analysis of Algorithmic Pricing on Amazon Marketplace , 2016, WWW.

[76]  Sorelle A. Friedler,et al.  Hiring by Algorithm: Predicting and Preventing Disparate Impact , 2016 .

[77]  Christo Wilson,et al.  Peeking Beneath the Hood of Uber , 2015, Internet Measurement Conference.

[78]  Frank A. Pasquale The Black Box Society: The Secret Algorithms That Control Money and Information , 2015 .

[79]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[80]  P. Jonathon Phillips,et al.  Face recognition vendor test 2002 , 2003, 2003 IEEE International SOI Conference. Proceedings (Cat. No.03CH37443).

[81]  L. Quam The Audit Society: Rituals of Verification , 1998 .

[82]  Lawrence Tagg Services , 1988, Industry, Innovation and Infrastructure.

[83]  Alfred Osmond Our Work , 1924 .

[84]  Abigail Matthews ARTIFICIAL INTELLIGENCE RISK MANAGEMENT FRAMEWORK , 2022 .

[85]  S. Kleanthous To “See” is to Stereotype Image Tagging Algorithms, Gender Recognition, and the Accuracy – Fairness Trade-off , 2020 .

[86]  F. Cherry,et al.  Making It Count: Discrimination Auditing and the Activist Scholar Tradition , 2018 .

[87]  Karrie Karahalios,et al.  Auditing Algorithms : Research Methods for Detecting Discrimination on Internet Platforms , 2014 .

[88]  Patrick Grother,et al.  Face Recognition Vendor Test (FRVT) , 2014 .

[89]  徐繼聖 從Face Recognition Vendor Test (FRVT) 2006看臉部辨?科技近期的考驗 , 2006 .

[90]  E. Thomas Institute , 2006, Schweizerische Zeitschrift für Hydrologie.

[91]  Jason Boehm,et al.  National Institute of Standards and Technology , 2002 .

[92]  re-The (Im)possibility of Fairness: Different Value Systems Require Different Mechanisms For Fair Decision Making , 2022 .