Adaptive Sampling to Reduce Disparate Performance

Existing methods for reducing disparate performance of a classifier across different demographic groups assume that one has access to a large data set, thereby focusing on the algorithmic aspect of optimizing overall performance subject to additional constraints. However, poor data collection and imbalanced data sets can severely affect the quality of these methods. In this work, we consider a setting where data collection and optimization are performed simultaneously. In such a scenario, a natural strategy to mitigate the performance difference of the classifier is to provide additional training data drawn from the demographic groups that are worse off. In this paper, we propose to consistently follow this strategy throughout the whole training process and to guide the resulting classifier towards equal performance on the different groups by adaptively sampling each data point from the group that is currently disadvantaged. We provide a rigorous theoretical analysis of our approach in a simplified one-dimensional setting and an extensive experimental evaluation on numerous real-world data sets, including a case study on the data collected during the Flint water crisis.

[1]  Alex Pentland,et al.  Active Fairness in Algorithmic Decision Making , 2018, AIES.

[2]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[3]  Katrina Ligett,et al.  Penalizing Unfairness in Binary Classification , 2017 .

[4]  Péter Horváth,et al.  modAL: A modular active learning framework for Python , 2018, ArXiv.

[5]  Pranjal Awasthi,et al.  Fair k-Center Clustering for Data Summarization , 2019, ICML.

[6]  Seth Neel,et al.  An Empirical Study of Rich Subgroup Fairness for Machine Learning , 2018, FAT.

[7]  Krishna P. Gummadi,et al.  Fairness Constraints: A Flexible Approach for Fair Classification , 2019, J. Mach. Learn. Res..

[8]  Steve Hanneke,et al.  Theory of Disagreement-Based Active Learning , 2014, Found. Trends Mach. Learn..

[9]  Pranjal Awasthi,et al.  Guarantees for Spectral Clustering with Fairness Constraints , 2019, ICML.

[10]  Indre Zliobaite,et al.  On the relation between accuracy and fairness in binary classification , 2015, ArXiv.

[11]  Avi Feller,et al.  Algorithmic Decision Making and the Cost of Fairness , 2017, KDD.

[12]  Lyle H. Ungar,et al.  Machine Learning manuscript No. (will be inserted by the editor) Active Learning for Logistic Regression: , 2007 .

[13]  Deeparnab Chakrabarty,et al.  Fair Algorithms for Clustering , 2019, NeurIPS.

[14]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[15]  Sara Ahmadian,et al.  Clustering without Over-Representation , 2019, KDD.

[16]  Alexandra Chouldechova,et al.  Does mitigating ML's impact disparity require treatment disparity? , 2017, NeurIPS.

[17]  Jean-Baptiste Tristan,et al.  Unlocking Fairness: a Trade-off Revisited , 2019, NeurIPS.

[18]  David Sontag,et al.  Why Is My Classifier Discriminatory? , 2018, NeurIPS.

[19]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[20]  John Langford,et al.  A Reductions Approach to Fair Classification , 2018, ICML.

[21]  Krishna P. Gummadi,et al.  Fairness Constraints: Mechanisms for Fair Classification , 2015, AISTATS.

[22]  Abolfazl Asudeh,et al.  Fair Active Learning , 2020, Expert Syst. Appl..

[23]  Anna L. Cox,et al.  Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems , 2019, CHI.

[24]  Silvio Lattanzi,et al.  Fair Clustering Through Fairlets , 2018, NIPS.

[25]  Krishna P. Gummadi,et al.  Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment , 2016, WWW.

[26]  Linda F. Wightman LSAC National Longitudinal Bar Passage Study. LSAC Research Report Series. , 1998 .

[27]  M. Kearns,et al.  Fairness in Criminal Justice Risk Assessments: The State of the Art , 2017, Sociological Methods & Research.

[28]  Ricardo Baeza-Yates,et al.  FA*IR: A Fair Top-k Ranking Algorithm , 2017, CIKM.

[29]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[30]  Miroslav Dudík,et al.  Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need? , 2018, CHI.

[31]  W. Marsden I and J , 2012 .

[32]  Mohit Singh,et al.  The Price of Fair PCA: One Extra Dimension , 2018, NeurIPS.

[33]  Nisheeth K. Vishnoi,et al.  Fair and Diverse DPP-based Data Summarization , 2018, ICML.

[34]  Shai Ben-David,et al.  Empirical Risk Minimization under Fairness Constraints , 2018, NeurIPS.

[35]  Thorsten Joachims,et al.  Fairness of Exposure in Rankings , 2018, KDD.

[36]  Mohit Singh,et al.  Multi-Criteria Dimensionality Reduction with Applications to Fairness , 2019, NeurIPS.

[37]  Nathan Srebro,et al.  Learning Non-Discriminatory Predictors , 2017, COLT.

[38]  Nisheeth K. Vishnoi,et al.  Ranking with Fairness Constraints , 2017, ICALP.

[39]  Eric M. Schwartz,et al.  ActiveRemediation: The Search for Lead Pipes in Flint, Michigan , 2018, KDD.

[40]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.