Measuring Fairness under Unawareness via Quantification

Models trained by means of supervised learning are increasingly deployed in high-stakes domains, and, when their predictions inform decisions about people, they inevitably impact (positively or negatively) on their lives. As a consequence, those in charge of developing these models must carefully evaluate their impact on different groups of people and ensure that sensitive demographic attributes, such as race or sex, do not result in unfair treatment for members of specific groups. For doing this, awareness of demographic attributes on the part of those evaluating model impacts is fundamental. Unfortunately, the collection of these attributes is often in conflict with industry practices and legislation on data minimization and privacy. For this reason, it may be hard to measure the group fairness of trained models, even from within the companies developing them. In this work, we tackle the problem of measuring group fairness under unawareness of sensitive attributes, by using techniques from quantification, a supervised learning task concerned with directly providing group-level prevalence estimates (rather than individual-level class labels). We identify five important factors that complicate the estimation of fairness under unawareness and formalize them into five different experimental protocols under which we assess the effectiveness of different estimators of group fairness. We also consider the problem of potential model misuse to infer sensitive attributes at an individual level, and demonstrate that quantification approaches are suitable for decoupling the (desirable) objective of measuring Alessandro Fabris Dipartimento di Ingegneria dell’Informazione Università di Padova Via Giovanni Gradenigo 6B – 35131 Padova, Italy E-mail: fabrisal@dei.unipd.it Andrea Esuli · Alejandro Moreo · Fabrizio Sebastiani Istituto di Scienza e Tecnologie dell’Informazione Consiglio Nazionale delle Ricerche Via Giuseppe Moruzzi 1 – 56124 Pisa, Italy E-mail: firstname.lastname@isti.cnr.it ar X iv :2 10 9. 08 54 9v 1 [ cs .C Y ] 1 7 Se p 20 21 2 Fabris, Esuli, Moreo, Sebastiani group fairness from the (undesirable) objective of inferring sensitive attributes of individuals.

[1]  Aaron Rieke,et al.  Awareness in practice: tensions in access to sensitive attribute data for antidiscrimination , 2019, FAT*.

[2]  Janet L. Lauritsen,et al.  Racial Context and Crime Reporting: A Test of Black’s Stratification Hypothesis , 2012 .

[3]  S. Tamang,et al.  Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data , 2018, JAMA internal medicine.

[4]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[5]  Naomi Ellemers,et al.  Gender Stereotypes , 2019, Encyclopedia of the UN Sustainable Development Goals.

[6]  Allison Woodruff,et al.  Putting Fairness Principles into Practice: Challenges, Metrics, and Improvements , 2019, AIES.

[7]  José Hernández-Orallo,et al.  Quantification via Probability Estimators , 2010, 2010 IEEE International Conference on Data Mining.

[8]  Ed H. Chi,et al.  Fairness in Recommendation Ranking through Pairwise Comparisons , 2019, KDD.

[9]  Bernhard Schölkopf,et al.  Optimal Decision Making Under Strategic Behavior , 2019, ArXiv.

[10]  Enrique Alegre,et al.  Class distribution estimation based on the Hellinger distance , 2013, Inf. Sci..

[11]  Krishna P. Gummadi,et al.  Blind Justice: Fairness with Encrypted Sensitive Attributes , 2018, ICML.

[12]  Nathan Kallus,et al.  Fairness, Welfare, and Equity in Personalized Pricing , 2020, FAccT.

[13]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[14]  Fabrizio Sebastiani,et al.  Re-Assessing the "Classify and Count" Quantification Method , 2020, ArXiv.

[15]  Suvam Mukherjee,et al.  Ensuring Fairness under Prior Probability Shifts , 2020, AIES.

[16]  Chris Russell,et al.  Why Fairness Cannot Be Automated: Bridging the Gap Between EU Non-Discrimination Law and AI , 2020, Comput. Law Secur. Rev..

[17]  Rumi Chunara,et al.  Fairness Violations and Mitigation under Covariate Shift , 2019, FAccT.

[18]  Mark J. F. Gales,et al.  Shifts: A Dataset of Real Distributional Shift Across Multiple Large-Scale Tasks , 2021, ArXiv.

[19]  Nicole Immorlica,et al.  The Disparate Effects of Strategic Manipulation , 2018, FAT.

[20]  Viviane Namaste,et al.  Invisible Lives: The Erasure of Transsexual and Transgendered People , 2000 .

[21]  Natalie Shlomo,et al.  Evaluating, Comparing, Monitoring, and Improving Representativeness of Survey Response Through R‐Indicators and Partial R‐Indicators , 2012 .

[22]  Brian A. Nosek,et al.  Math Male , Me Female , Therefore Math Me , 2002 .

[23]  Nitesh V. Chawla,et al.  A Review on Quantification Learning , 2017, ACM Comput. Surv..

[24]  Francisco Herrera,et al.  A unifying view on dataset shift in classification , 2012, Pattern Recognit..

[25]  Pranjal Awasthi,et al.  Equalized odds postprocessing under imperfect group information , 2019, AISTATS.

[26]  Sahin Cem Geyik,et al.  Fairness-Aware Ranking in Search & Recommendation Systems with Application to LinkedIn Talent Search , 2019, KDD.

[27]  Marco Saerens,et al.  Adjusting the Outputs of a Classifier to New a Priori Probabilities: A Simple Procedure , 2002, Neural Computation.

[28]  D. McCaffrey,et al.  Using the Census Bureau’s surname list to improve estimates of race/ethnicity and associated disparities , 2009, Health Services and Outcomes Research Methodology.

[29]  Alexandra Chouldechova,et al.  On the Validity of Arrest as a Proxy for Offense: Race and the Likelihood of Arrest for Violent Crimes , 2021, AIES.

[30]  F. Sebastiani,et al.  Tweet sentiment quantification: An experimental re-evaluation , 2020, PloS one.

[31]  Danah Boyd,et al.  Fairness and Abstraction in Sociotechnical Systems , 2019, FAT.

[32]  Kristina Lerman,et al.  A Geometric Solution to Fair Representations , 2020, AIES.

[33]  Kush R. Varshney,et al.  Fair Transfer Learning with Missing Protected Attributes , 2019, AIES.

[34]  Timnit Gebru,et al.  Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[35]  Nathan Kallus,et al.  Residual Unfairness in Fair Machine Learning from Prejudiced Data , 2018, ICML.

[36]  Percy Liang,et al.  Fairness Without Demographics in Repeated Loss Minimization , 2018, ICML.

[37]  Brian W. Powers,et al.  Dissecting racial bias in an algorithm used to manage the health of populations , 2019, Science.

[38]  M. Linn,et al.  New trends in gender and mathematics performance: a meta-analysis. , 2010, Psychological bulletin.

[39]  Yang Liu,et al.  Fair Classification with Group-Dependent Label Noise , 2021, FAccT.

[40]  Kevin A. McLemore Experiences with Misgendering: Identity Misclassification of Transgender Spectrum Individuals , 2015 .

[41]  Alexandra Chouldechova,et al.  Fairness Evaluation in Presence of Biased Noisy Labels , 2020, AISTATS.

[42]  Shai Ben-David,et al.  Empirical Risk Minimization under Fairness Constraints , 2018, NeurIPS.

[43]  Sendhil Mullainathan,et al.  Are Emily and Greg More Employable Than Lakisha and Jamal , 2018 .

[44]  Gregory Ditzler,et al.  Learning in Nonstationary Environments: A Survey , 2015, IEEE Computational Intelligence Magazine.

[45]  Alexandra Chouldechova,et al.  Does mitigating ML's impact disparity require treatment disparity? , 2017, NeurIPS.

[46]  R. Hasnain-Wynia,et al.  Obtaining data on patient race, ethnicity, and primary language in health care organizations: current challenges and proposed solutions. , 2006, Health services research.

[47]  L. Elisa Celis,et al.  Mitigating Bias in Set Selection with Noisy Protected Attributes , 2020, ArXiv.

[48]  Elena Spitzer,et al.  "What We Can't Measure, We Can't Understand": Challenges to Demographic Data Procurement in the Pursuit of Fairness , 2020, ArXiv.

[49]  Michael Veale,et al.  Fairer machine learning in the real world: Mitigating discrimination without collecting sensitive data , 2017, Big Data Soc..

[50]  George Forman,et al.  Quantifying counts and costs via classification , 2008, Data Mining and Knowledge Discovery.

[51]  Xiaojie Mao,et al.  Assessing algorithmic fairness with unobserved protected class using data combination , 2019, FAT*.

[52]  Oliver Smith,et al.  Auditing Algorithms: On Lessons Learned and the Risks of Data Minimization , 2020, AIES.

[53]  Brad W. Smith,et al.  Minority Threat, Crime Control, and Police Resource Allocation in the Southwestern United States , 2008 .

[54]  Andrew N Meltzoff,et al.  Math–gender Stereotypes in Elementary School Children , 2011 .

[55]  Christo Wilson,et al.  Building and Auditing Fair Algorithms: A Case Study in Candidate Screening , 2021, FAccT.

[56]  Inioluwa Deborah Raji,et al.  Closing the AI accountability gap: defining an end-to-end framework for internal algorithmic auditing , 2020, FAT*.

[57]  Pranjal Awasthi,et al.  Evaluating Fairness of Machine Learning Models Under Uncertain and Incomplete Information , 2021, FAccT.

[58]  Bamshad Mobasher,et al.  Feedback Loop and Bias Amplification in Recommender Systems , 2020, CIKM.

[59]  Rafael Izbicki,et al.  Quantification under prior probability shift: the ratio estimator and its extensions , 2018, J. Mach. Learn. Res..

[60]  Krishna P. Gummadi,et al.  Fairness Constraints: Mechanisms for Fair Classification , 2015, AISTATS.

[61]  Madeleine Udell,et al.  Fairness Under Unawareness: Assessing Disparity When Protected Class Is Unobserved , 2018, FAT.

[62]  R. Hasnain-Wynia,et al.  Patients’ attitudes toward health care providers collecting information about their race and ethnicity , 2005, Journal of General Internal Medicine.

[63]  Jelke Bethlehem,et al.  Indicators for the representativeness of survey response , 2009 .

[64]  Miroslav Dudík,et al.  Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need? , 2018, CHI.