50 Years of Test (Un)fairness: Lessons for Machine Learning

Quantitative definitions of what is unfair and what is fair have been introduced in multiple disciplines for well over 50 years, including in education, hiring, and machine learning. We trace how the notion of fairness has been defined within the testing communities of education and hiring over the past half century, exploring the cultural and social context in which different fairness definitions have emerged. In some cases, earlier definitions of fairness are similar or identical to definitions of fairness in current machine learning research, and foreshadow current formal work. In other cases, insights into what fairness means and how to measure it have largely gone overlooked. We compare past and current notions of fairness along several dimensions, including the fairness criteria, the focus of the criteria (e.g., a test, a model, or its use), the relationship of fairness to individuals, groups, and subgroups, and the mathematical method for measuring fairness (e.g., classification, regression). This work points the way towards future research and measurement of (un)fairness that builds from our modern understanding of fairness while incorporating insights from the past.

[1]  A. Anastasi Psychological Tests: Uses and Abuses , 1961, Teachers College Record: The Voice of Scholarship in Education.

[2]  P. Ash The implications of the Civil Rights Acts of 1964 for psychological assessment in industry. , 1966, The American psychologist.

[3]  T. Cleary TEST BIAS: VALIDITY OF THE SCHOLASTIC APTITUDE TEST FOR NEGRO AND WHITE STUDENTS IN INTEGRATED COLLEGES , 1966 .

[4]  Donald A. Rock,et al.  AN EXPLORATORY STUDY OF PROGRAMMED TESTS , 1966 .

[5]  R. Guion Employment Tests and Discriminatory Hiring , 1966 .

[6]  T. Cleary TEST BIAS: PREDICTION OF GRADES OF NEGRO AND WHITE STUDENTS IN INTEGRATED COLLEGES , 1968 .

[7]  T. Cleary,et al.  An Investigation of Item Bias , 1968 .

[8]  R. L. Thorndike CONCEPTS OF CULTURE-FAIRNESS , 1971 .

[9]  H. J. Einhorn,et al.  Methodological considerations relevant to discrimination in employment testing. , 1971, Psychological bulletin.

[10]  R. Darlington,et al.  ANOTHER LOOK AT “CULTURAL FAIRNESS”1 , 1971 .

[11]  Charles L. Thomas The Overprediction Phenomenon among Black Collegians: Some Prelinimary Considerations. , 1973 .

[12]  R. Linn Fair Test Use in Selection1 , 1973 .

[13]  N. Cole BIAS IN SELECTION , 1973 .

[14]  Marshall B. Jones Moderated Regression and Equal Opportunity , 1973 .

[15]  R. Flaugher Bias in Testing: A Review and Discussion. TM Report No. 36. , 1974 .

[16]  R. Samuda Psychological Testing of American Minorities: Issues and Consequences , 1975 .

[17]  F. Schmidt,et al.  Critical analysis of the statistical and ethical implications of various definitions of test bias. , 1976 .

[18]  N. Petersen,et al.  An Expected Utility Model for “Optimal” Selection , 1976 .

[19]  R. Linn IN SEARCH OF FAIR SELECTION PROCEDURES , 1976 .

[20]  M. R. Novick,et al.  TOWARDS EQUALIZING EDUCATIONAL AND EMPLOYMENT OPPORTUNITY , 1976 .

[21]  M. R. Novick,et al.  AN EVALUATION OF SOME MODELS FOR CULTURE-FAIR SELECTION , 1976 .

[22]  N. Cole,et al.  Utilities and the Issue of Fairness in a Decision Theoretic Model for Selection. , 1976 .

[23]  J. Scheuneman A METHOD OF ASSESSING BIAS IN TEST ITEMS , 1979 .

[24]  Bias in Mental Testing , 1981 .

[25]  A. Jensen,et al.  Précis of Bias in Mental Testing , 1980, Behavioral and Brain Sciences.

[26]  R. L. Williams,et al.  The War Against Testing: A Current Status Report , 1980 .

[27]  J. Hartigan,et al.  Fairness in Employment Testing: Validity Generalization, Minority Issues, and the General Aptitude Test Battery , 1989 .

[28]  S. Zedeck Fairness in Employment Testing: Validity Generalization, Minority Issues, and the General Aptitude Test Battery. , 1990 .

[29]  P. Holland,et al.  DIF DETECTION AND DESCRIPTION: MANTEL‐HAENSZEL AND STANDARDIZATION1,2 , 1992 .

[30]  S. Maxwell,et al.  Dichotomization, Partial Correlation, and Conditional Independence , 1996 .

[31]  N. Cole,et al.  Gender and fair assessment , 1997 .

[32]  C. Jencks,et al.  The Black-White Test Score Gap. , 1998 .

[33]  N. Cole,et al.  The New Faces of Fairness , 2001 .

[34]  Shameem Nyla NATIONAL COUNCIL ON MEASUREMENT IN EDUCATION , 2004 .

[35]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[36]  George Farkas,et al.  The Black-White Test Score Gap , 2004 .

[37]  R. Shibata,et al.  PARTIAL CORRELATION AND CONDITIONAL CORRELATION AS MEASURES OF CONDITIONAL INDEPENDENCE , 2004 .

[38]  Jerome Karabel,et al.  The Chosen: The Hidden History of Admission and Exclusion at Harvard, Yale, and Princeton , 2005 .

[39]  Gideon S. Mann,et al.  Simple, robust, scalable semi-supervised learning via expectation regularization , 2007, ICML '07.

[40]  Kimberly West-Faulcon Fairness Feuds: Competing Conceptions of Title VII Discriminatory Testing , 2011 .

[41]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[42]  N. Dorans ETS CONTRIBUTIONS TO THE QUANTITATIVE ASSESSMENT OF ITEM, TEST, AND SCORE FAIRNESS , 2013 .

[43]  Irina Cojuharenco,et al.  Workplace fairness versus unfairness: Examining the differential salience of facets of organizational justice , 2013 .

[44]  Maya R. Gupta,et al.  Satisfying Real-world Goals with Dataset Constraints , 2016, NIPS.

[45]  Sharad Goel,et al.  The Problem of Infra-Marginality in Outcome Tests for Discrimination , 2016, 1607.05376.

[46]  Cathy O'Neil,et al.  Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy , 2016, Vikalpa: The Journal for Decision Makers.

[47]  Randall D. Penfield Fairness in Test Scoring , 2016 .

[48]  Linda L. Cook,et al.  Fairness in Educational Assessment and Measurement , 2016 .

[49]  Harvard Yale,et al.  The Chosen The Hidden History Of Admission And Exclusion At Harvard Yale And Princeton , 2016 .

[50]  S. E. Phillips Legal Aspects of Test Fairness , 2016 .

[51]  COMPAS Risk Scales : Demonstrating Accuracy Equity and Predictive Parity Performance of the COMPAS Risk Scales in Broward County , 2016 .

[52]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[53]  Matt J. Kusner,et al.  Counterfactual Fairness , 2017, NIPS.

[54]  Anna Lauren Hoffmann,et al.  Data, Technology, and Gender : Thinking About (and From) Trans Lives , 2017 .

[55]  Alexandra Chouldechova,et al.  Fair prediction with disparate impact: A study of bias in recidivism prediction instruments , 2016, Big Data.

[56]  Hee Jung Ryu,et al.  InclusiveFaceNet: Improving Face Attribute Detection with Race and Gender Diversity , 2017 .

[57]  Tony Doyle,et al.  Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy , 2017, Inf. Soc..

[58]  Zhe Zhao,et al.  Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations , 2017, ArXiv.

[59]  Avi Feller,et al.  Algorithmic Decision Making and the Cost of Fairness , 2017, KDD.

[60]  Krishna P. Gummadi,et al.  From Parity to Preference-based Notions of Fairness in Classification , 2017, NIPS.

[61]  Jon M. Kleinberg,et al.  Inherent Trade-Offs in the Fair Determination of Risk Scores , 2016, ITCS.

[62]  Timnit Gebru,et al.  Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[63]  Seth Neel,et al.  Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness , 2017, ICML.

[64]  Suresh Venkatasubramanian,et al.  Runaway Feedback Loops in Predictive Policing , 2017, FAT.

[65]  Harris Mateen Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy , 2018 .

[66]  Solon Barocas,et al.  Prediction-Based Decisions and Fairness: A Catalogue of Choices, Assumptions, and Definitions , 2018, 1811.07867.

[67]  M. Kearns,et al.  Fairness in Criminal Justice Risk Assessments: The State of the Art , 2017, Sociological Methods & Research.

[68]  Blake Lemoine,et al.  Mitigating Unwanted Biases with Adversarial Learning , 2018, AIES.

[69]  Nisheeth K. Vishnoi,et al.  Ranking with Fairness Constraints , 2017, ICALP.

[70]  Morgan Klaus Scheuerman,et al.  Gender Recognition or Gender Reductionism?: The Social Implications of Embedded Gender Recognition Systems , 2018, CHI.

[71]  Rajen Dinesh Shah,et al.  The hardness of conditional independence testing and the generalised covariance measure , 2018, The Annals of Statistics.

[72]  James R. Foulds,et al.  An Intersectional Definition of Fairness , 2018, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[73]  Suresh Venkatasubramanian,et al.  On the (im)possibility of fairness , 2016, ArXiv.