Proportionally Fair Clustering Revisited

In this work, we study fairness in centroid clustering. In this problem, k cluster centers must be placed given n points in a metric space, and the cost to each point is its distance to the nearest cluster center. Recent work of Chen et al. [8] introduces the notion of a proportionally fair clustering, in which no group of at least n/k points can find a new cluster center which provides lower cost to each member of the group. They propose a greedy capture algorithm which provides a 1 + √ 2 approximation of proportional fairness for any metric space, and derive generalization bounds for learning proportionally fair clustering from samples in the case where a cluster center can only be placed at one of finitely many given locations in the metric space. We focus on the case where cluster centers can be placed anywhere in the (usually infinite) metric space. In case of the L2 distance metric over R, we show that the approximation ratio of greedy capture improves to 2. We also show that this is due to a special property of the L2 distance; for the L1 and L∞ distances, the approximation ratio remains 1 + √ 2. We provide universal lower bounds which apply to all algorithms. We also consider metric spaces defined on graphs. For trees, we show that an exact proportionally fair clustering always exists and provide an efficient algorithm to find one. The corresponding question for general graph remains an interesting open question. Finally, we show that for the L2 distance, checking whether a proportionally fair clustering exists and implementing greedy capture over an infinite metric space are NP-hard problems, but (approximately) solvable in special cases. We also derive generalization bounds which show that an approximately proportionally fair clustering for a large number of points can be learned from a small number of samples. Our work advances the understanding of proportional fairness in clustering, and points out many avenues for future work. 2012 ACM Subject Classification Theory of computation→ Algorithmic mechanism design; Theory of computation → Facility location and clustering

[1]  Christopher T. Lowenkamp,et al.  False Positives, False Negatives, and False Analyses: A Rejoinder to "Machine Bias: There's Software Used across the Country to Predict Future Criminals. and It's Biased against Blacks" , 2016 .

[2]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[3]  Noga Alon,et al.  Strategyproof Approximation of the Minimax on Networks , 2010, Math. Oper. Res..

[4]  Percy Liang,et al.  Fairness Without Demographics in Repeated Loss Minimization , 2018, ICML.

[5]  Melanie Schmidt,et al.  Privacy preserving clustering with constraints , 2018, ICALP.

[6]  Seth Neel,et al.  Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness , 2017, ICML.

[7]  L. Shapley,et al.  On cores and indivisibility , 1974 .

[8]  Stephen T. Hedetniemi,et al.  A theorem of Ore and self-stabilizing algorithms for disjoint minimal dominating sets , 2015, Theor. Comput. Sci..

[9]  Krzysztof Onak,et al.  Scalable Fair Clustering , 2019, ICML.

[10]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[11]  Kamesh Munagala,et al.  Proportionally Fair Clustering , 2019, ICML.

[12]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[13]  Krishna P. Gummadi,et al.  Fairness Constraints: A Flexible Approach for Fair Classification , 2019, J. Mach. Learn. Res..

[14]  Kamesh Munagala,et al.  Fair Allocation of Indivisible Public Goods , 2018, EC.

[15]  Anupam Gupta,et al.  Approximation Algorithms for Aversion k-Clustering via Local k-Median , 2016, ICALP.

[16]  Fred B. Schneider,et al.  A Theory of Graphs , 1993 .

[17]  Vincent Conitzer,et al.  Group Fairness for the Allocation of Indivisible Goods , 2019, AAAI.

[18]  H. Varian Equity, Envy and Efficiency , 1974 .

[19]  Michal Feldman,et al.  Strategyproof facility location and the least squares objective , 2013, EC '13.

[20]  Nabil H. Mustafa,et al.  Tight Lower Bounds on the VC-dimension of Geometric Set Systems , 2019, J. Mach. Learn. Res..

[21]  Miroslav Dudík,et al.  Fair Regression: Quantitative Definitions and Reduction-based Algorithms , 2019, ICML.

[22]  et al.,et al.  Algorithmic and Economic Perspectives on Fairness , 2019, ArXiv.

[23]  Deeparnab Chakrabarty,et al.  Fair Algorithms for Clustering , 2019, NeurIPS.

[24]  Michael B. Partensky,et al.  Spatial Localization Problem and the Circle of Apollonius , 2007 .

[25]  Toon Calders,et al.  Building Classifiers with Independency Constraints , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[26]  Dimitris Fotakis Incremental algorithms for Facility Location and k-Median , 2006, Theor. Comput. Sci..

[27]  Tony Doyle,et al.  Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy , 2017, Inf. Soc..

[28]  David C. Parkes,et al.  Fairness without Harm: Decoupled Classifiers with Preference Guarantees , 2019, ICML.

[29]  Hervé Moulin,et al.  Fair division and collective welfare , 2003 .

[30]  Nisarg Shah,et al.  Designing Fairly Fair Classifiers Via Economic Fairness Notions , 2020, WWW.

[31]  Silvio Lattanzi,et al.  Fair Clustering Through Fairlets , 2018, NIPS.

[32]  Kristina Lerman,et al.  A Survey on Bias and Fairness in Machine Learning , 2019, ACM Comput. Surv..

[33]  V. V. Shenmaier,et al.  The problem of a minimal ball enclosing k points , 2013 .

[34]  Toniann Pitassi,et al.  Learning Fair Representations , 2013, ICML.

[35]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.