Fair Labeled Clustering

The widespread use of machine learning algorithms in settings that directly affect human lives has instigated significant interest in designing variants of these algorithms that are provably fair. Recent work in this direction has produced numerous algorithms for the fundamental problem of clustering under many different notions of fairness. Perhaps the most common family of notions currently studied is group fairness, in which proportional group representation is ensured in every cluster. We extend this direction by considering the downstream application of clustering and how group fairness should be ensured for such a setting. Specifically, we consider a common setting in which a decision-maker runs a clustering algorithm, inspects the center of each cluster, and decides an appropriate outcome (label) for its corresponding cluster. In hiring for example, there could be two outcomes, positive (hire) or negative (reject), and each cluster would be assigned one of these two outcomes. To ensure group fairness in such a setting, we would desire proportional group representation in every label but not necessarily in every cluster as is done in group fair clustering. We provide algorithms for such problems and show that in contrast to their NP-hard counterparts in group fair clustering, they permit efficient solutions. We also consider a well-motivated alternative setting where the decision-maker is free to assign labels to the clusters regardless of the centers' positions in the metric space. We show that this setting exhibits interesting transitions from computationally hard to easy according to additional constraints on the problem. Moreover, when the constraint parameters take on natural values we show a randomized algorithm for this setting that always achieves an optimal clustering and satisfies the fairness constraints in expectation. Finally, we run experiments on real world datasets that validate the effectiveness of our algorithms.

[1]  Chung Keung Poon,et al.  Candidate selections with proportional fairness constraints , 2021, Autonomous Agents and Multi-Agent Systems.

[2]  Santosh Vempala,et al.  Socially Fair k-Means Clustering , 2020, FAccT.

[3]  Samir Khuller,et al.  A Pairwise Fair and Community-preserving Approach to k-Center Clustering , 2020, ICML.

[4]  Aditya Bhaskara,et al.  Fair Clustering via Equitable Group Representations , 2020, FAccT.

[5]  John P. Dickerson,et al.  Probabilistic Fair Clustering , 2020, NeurIPS.

[6]  S. S. Ravi,et al.  Making Existing Clusterings Fairer: Algorithms, Complexity Results and Insights , 2020, AAAI.

[7]  Brian W. Powers,et al.  Dissecting racial bias in an algorithm used to manage the health of populations , 2019, Science.

[8]  Nisheeth K. Vishnoi,et al.  Coresets for Clustering with Fairness Constraints , 2019, NeurIPS.

[9]  Sara Ahmadian,et al.  Clustering without Over-Representation , 2019, KDD.

[10]  Krzysztof Onak,et al.  Scalable Fair Clustering , 2019, ICML.

[11]  Pranjal Awasthi,et al.  Fair k-Center Clustering for Data Summarization , 2019, ICML.

[12]  Deeparnab Chakrabarty,et al.  Fair Algorithms for Clustering , 2019, NeurIPS.

[13]  Samir Khuller,et al.  On the cost of essentially fair clusterings , 2018, APPROX-RANDOM.

[14]  David G. Harris,et al.  Approximation algorithms for stochastic clustering , 2018, NeurIPS.

[15]  Silvio Lattanzi,et al.  Fair Clustering Through Fairlets , 2018, NIPS.

[16]  Michael Carl Tschantz,et al.  Discrimination in Online Advertising: A Multidisciplinary Inquiry , 2018 .

[17]  Krishna P. Gummadi,et al.  Potential for Discrimination in Online Targeted Advertising , 2018, FAT.

[18]  Carlos Eduardo Scheidegger,et al.  Certifying and Removing Disparate Impact , 2014, KDD.

[19]  Toniann Pitassi,et al.  Learning Fair Representations , 2013, ICML.

[20]  Kun Guo,et al.  Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining , 2012 .

[21]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[22]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[23]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[24]  Rajiv Gandhi,et al.  Dependent rounding and its applications to approximation algorithms , 2006, JACM.

[25]  M. Steinbach,et al.  Introduction to Data Mining , 2005, Principles of Data Mining.

[26]  Chaitanya Swamy,et al.  Facility location with Service Installation Costs , 2004, SODA '04.

[27]  Dachuan Xu,et al.  Approximation algorithm for facility location with service installation costs , 2008, Oper. Res. Lett..

[28]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .