Moving beyond “algorithmic bias is a data problem”

A surprisingly sticky belief is that a machine learning model merely reflects existing algorithmic bias in the dataset and does not itself contribute to harm. Why, despite clear evidence to the contrary, does the myth of the impartial model still hold allure for so many within our research community? Algorithms are not impartial, and some design choices are better than others. Recognizing how model design impacts harm opens up new mitigation techniques that are less burdensome than comprehensive data collection.

[1]  Ziheng Jiang,et al.  Characterizing Structural Regularities of Labeled Data in Overparameterized Models , 2020, ICML.

[2]  Aaron C. Courville,et al.  What Do Compressed Deep Neural Networks Forget , 2019, 1911.05248.

[3]  Emily Denton,et al.  Characterising Bias in Compressed Models , 2020, ArXiv.

[4]  Daniel Jurafsky,et al.  Word embeddings quantify 100 years of gender and ethnic stereotypes , 2017, Proceedings of the National Academy of Sciences.

[5]  D. Sculley,et al.  No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World , 2017, 1711.08536.

[6]  Aditya Bhaskara,et al.  Reliable Model Compression via Label-Preservation-Aware Loss Functions , 2020, ArXiv.

[7]  Elena Spitzer,et al.  "What We Can't Measure, We Can't Understand": Challenges to Demographic Data Procurement in the Pursuit of Fairness , 2020, ArXiv.

[8]  Timnit Gebru,et al.  Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[9]  Andrea Piesse,et al.  Effects of the National Youth Anti-Drug Media Campaign on youths. , 2008, American journal of public health.

[10]  Vitaly Shmatikov,et al.  Differential Privacy Has Disparate Impact on Model Accuracy , 2019, NeurIPS.

[11]  Michael Veale,et al.  Fairer machine learning in the real world: Mitigating discrimination without collecting sensitive data , 2017, Big Data Soc..

[12]  J. Robertson The Great Plague: The Story of London's Most Deadly Year , 2005 .

[13]  Yun Fu,et al.  One Label, One Billion Faces: Usage and Consistency of Racial Categories in Computer Vision , 2021, FAccT.