Clustering with Fair-Center Representation: Parameterized Approximation Algorithms and Heuristics

We study a variant of classical clustering formulations in the context of algorithmic fairness, known as diversity-aware clustering. In this variant we are given a collection of facility subsets, and a solution must contain at least a specified number of facilities from each subset while simultaneously minimizing the clustering objective (k-median or k-means). We investigate the fixed-parameter tractability of these problems and show several negative hardness and inapproximability results, even when we afford exponential running time with respect to some parameters. Motivated by these results we identify natural parameters of the problem, and present fixed-parameter approximation algorithms with approximation ratios (1 + 2 over e + ∈) and (1 + 8 over e + ∈) for diversity-aware k-median and diversity-aware k-means respectively, and argue that these ratios are essentially tight assuming the gap-exponential time hypothesis. We also present a simple and more practical bicriteria approximation algorithm with better running time bounds. We finally propose efficient and practical heuristics. We evaluate the scalability and effectiveness of our methods in a wide variety of rigorously conducted experiments, on both real and synthetic data.

[1]  A. Gionis,et al.  Diversity-aware k-median : Clustering with fair center representation , 2021, ECML/PKDD.

[2]  J. Biddle On Predicting Recidivism: Epistemic Risk, Tradeoffs, and Values in Machine Learning , 2020, Canadian Journal of Philosophy.

[3]  Nisheeth K. Vishnoi,et al.  Coresets for Clustering with Fairness Constraints , 2019, NeurIPS.

[4]  Amit Kumar,et al.  Tight FPT Approximations for $k$-Median and k-Means , 2019, ICALP.

[5]  Krzysztof Onak,et al.  Scalable Fair Clustering , 2019, ICML.

[6]  Samir Khuller,et al.  On the cost of essentially fair clusterings , 2018, APPROX-RANDOM.

[7]  Silvio Lattanzi,et al.  Fair Clustering Through Fairlets , 2018, NIPS.

[8]  Melanie Schmidt,et al.  Privacy preserving clustering with constraints , 2018, ICALP.

[9]  Pasin Manurangsi,et al.  On the parameterized complexity of approximating dominating set , 2017, Electron. Colloquium Comput. Complex..

[10]  Shi Li,et al.  Constant approximation for k-median and k-means with outliers via iterative rounding , 2017, STOC.

[11]  Adam Tauman Kalai,et al.  Decoupled Classifiers for Group-Fair and Efficient Machine Learning , 2017, FAT.

[12]  M. Kearns,et al.  Fairness in Criminal Justice Risk Assessments: The State of the Art , 2017, Sociological Methods & Research.

[13]  Andreas Krause,et al.  Practical Coreset Constructions for Machine Learning , 2017, 1703.06476.

[14]  Ola Svensson,et al.  Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[15]  Krishna P. Gummadi,et al.  Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment , 2016, WWW.

[16]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[17]  Aravind Srinivasan,et al.  An Improved Approximation for k-Median and Positive Correlation in Budgeted Optimization , 2014, SODA.

[18]  Chaitanya Swamy,et al.  Improved Approximation Algorithms for Matroid and Knapsack Median Problems and Applications , 2013, APPROX-RANDOM.

[19]  R. Khandekar,et al.  Local Search Algorithms for the Red-Blue Median Problem , 2012, Algorithmica.

[20]  Shi Li,et al.  A Dependent LP-Rounding Approach for the k-Median Problem , 2012, ICALP.

[21]  Yoshio Okamoto,et al.  On Problems as Hard as CNF-SAT , 2011, 2012 IEEE 27th Conference on Computational Complexity.

[22]  Jan Vondrák,et al.  Maximizing a Monotone Submodular Function Subject to a Matroid Constraint , 2011, SIAM J. Comput..

[23]  Michael Langberg,et al.  A unified framework for approximating and clustering data , 2011, STOC.

[24]  Amit Kumar,et al.  The matroid median problem , 2011, SODA '11.

[25]  Mohammad Taghi Hajiaghayi,et al.  Budgeted Red-Blue Median and Its Generalizations , 2010, ESA.

[26]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[27]  Kamesh Munagala,et al.  Local Search Heuristics for k-Median and Facility Location Problems , 2004, SIAM J. Comput..

[28]  Sudipto Guha,et al.  A constant-factor approximation algorithm for the k-median problem (extended abstract) , 1999, STOC '99.

[29]  David M. Mount,et al.  A local search approximation algorithm for k-means clustering , 2002, SCG '02.

[30]  Kamesh Munagala,et al.  Local search heuristic for k-median and facility location problems , 2001, STOC '01.

[31]  Richard J. Lipton,et al.  On the complexity of SAT , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[32]  Russell Impagliazzo,et al.  Complexity of k-SAT , 1999, Proceedings. Fourteenth Annual IEEE Conference on Computational Complexity (Formerly: Structure in Complexity Theory Conference) (Cat.No.99CB36317).

[33]  Prabhakar Raghavan,et al.  Randomized rounding: A technique for provably good algorithms and algorithmic proofs , 1985, Comb..

[34]  Prasant Mohapatra,et al.  An Overview of Fairness in Clustering , 2021, IEEE Access.

[35]  Samir Khuller,et al.  Greedy strikes back: improved facility location algorithms , 1998, SODA '98.