A Pairwise Fair and Community-preserving Approach to k-Center Clustering

Clustering is a foundational problem in machine learning with numerous applications. As machine learning increases in ubiquity as a backend for automated systems, concerns about fairness arise. Much of the current literature on fairness deals with discrimination against protected classes in supervised learning (group fairness). We define a different notion of fair clustering wherein the probability that two points (or a community of points) become separated is bounded by an increasing function of their pairwise distance (or community diameter). We capture the situation where data points represent people who gain some benefit from being clustered together. Unfairness arises when certain points are deterministically separated, either arbitrarily or by someone who intends to harm them as in the case of gerrymandering election districts. In response, we formally define two new types of fairness in the clustering setting, pairwise fairness and community preservation. To explore the practicality of our fairness goals, we devise an approach for extending existing $k$-center algorithms to satisfy these fairness constraints. Analysis of this approach proves that reasonable approximations can be achieved while maintaining fairness. In experiments, we compare the effectiveness of our approach to classical $k$-center algorithms/heuristics and explore the tradeoff between optimal clustering and fairness.

[1]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[2]  Rina Panigrahy,et al.  Better streaming algorithms for clustering problems , 2003, STOC '03.

[3]  Pierre Hansen,et al.  Solving the p‐Center problem with Tabu Search and Variable Neighborhood Search , 2000, Networks.

[4]  Andrew Lim,et al.  k-Center problems with minimum coverage , 2004, Theor. Comput. Sci..

[5]  N Linial,et al.  Low diameter graph decompositions , 1993, Comb..

[6]  Yair Bartal,et al.  Probabilistic approximation of metric spaces and its algorithmic applications , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[7]  Micah Altman,et al.  The Promise and Perils of Computers in Redistricting , 2010 .

[8]  John E. Beasley,et al.  OR-Library: Distributing Test Problems by Electronic Mail , 1990 .

[9]  Samir Khuller,et al.  On the cost of essentially fair clusterings , 2018, APPROX-RANDOM.

[10]  Sepideh Mahabadi,et al.  (Individual) Fairness for k-Clustering , 2020, ICML.

[11]  Satish Rao,et al.  A tight bound on approximating arbitrary metrics by tree metrics , 2003, STOC '03.

[12]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[13]  Philip N. Klein,et al.  Balanced centroidal power diagrams for redistricting , 2018, SIGSPATIAL/GIS.

[14]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[15]  Wolfgang Maass,et al.  Approximation schemes for covering and packing problems in image processing and VLSI , 1985, JACM.

[16]  Silvio Lattanzi,et al.  Fair Clustering Through Fairlets , 2018, NIPS.

[17]  Jurij Mihelic,et al.  Solving the k-center Problem Efficiently with a Dominating Set Algorithm , 2005, J. Comput. Inf. Technol..

[18]  Rajeev Motwani,et al.  Incremental clustering and dynamic information retrieval , 1997, STOC '97.

[19]  Aravind Srinivasan,et al.  Approximation algorithms for stochastic clustering , 2018, NeurIPS.

[20]  Oren Etzioni,et al.  Fast and Intuitive Clustering of Web Documents , 1997, KDD.

[21]  Samir Khuller,et al.  The Capacitated K-Center Problem , 2000, SIAM J. Discret. Math..

[22]  Christopher Jung,et al.  Service in Your Neighborhood: Fairness in Center Location , 2020, FORC.

[23]  Samir Khuller,et al.  Streaming Algorithms for k-Center Clustering with Outliers and with Anonymity , 2008, APPROX-RANDOM.

[24]  Shaowen Wang,et al.  PEAR: a massively parallel evolutionary computation approach for political redistricting optimization and analysis , 2016, Swarm Evol. Comput..

[25]  Samir Khuller,et al.  Fault tolerant K-center problems , 1997, Theor. Comput. Sci..

[26]  David B. Shmoys,et al.  A Best Possible Heuristic for the k-Center Problem , 1985, Math. Oper. Res..

[27]  Pranjal Awasthi,et al.  Guarantees for Spectral Clustering with Fairness Constraints , 2019, ICML.

[28]  Ravishankar Krishnaswamy,et al.  The Non-Uniform k-Center Problem , 2016, ICALP.

[29]  Michael Carl Tschantz,et al.  Automated Experiments on Ad Privacy Settings , 2014, Proc. Priv. Enhancing Technol..

[30]  Rong Ge,et al.  Joint cluster analysis of attribute data and relationship data , 2008, SDM.

[31]  Mikkel Thorup,et al.  Quick k-Median, k-Center, and Facility Location for Sparse Graphs , 2001, SIAM J. Comput..

[32]  Toniann Pitassi,et al.  Learning Fair Representations , 2013, ICML.

[33]  Gustavo Malkomes,et al.  Fast Distributed k-Center Clustering with Outliers on Massive Data , 2015, NIPS.

[34]  Aravind Srinivasan,et al.  Meddling Metrics: the Effects of Measuring and Constraining Partisan Gerrymandering on Voter Incentives , 2020, EC.

[35]  Deeparnab Chakrabarty,et al.  Fair Algorithms for Clustering , 2019, NeurIPS.

[36]  Krzysztof Onak,et al.  Scalable Fair Clustering , 2019, ICML.

[37]  Itai Ashlagi,et al.  Improving Community Cohesion in School Choice via Correlated-Lottery Implementation , 2014, Oper. Res..

[38]  T L Chenevert,et al.  Utility of the K-Means Clustering Algorithm in Differentiating Apparent Diffusion Coefficient Values of Benign and Malignant Neck Pathologies , 2010, American Journal of Neuroradiology.

[39]  J. Beasley A note on solving large p-median problems , 1985 .

[40]  Micah Altman,et al.  Modeling the effect of mandatory district compactness on partisan gerrymanders , 1998 .

[41]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[42]  David R. Karger,et al.  Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[43]  Melanie Schmidt,et al.  Privacy preserving clustering with constraints , 2018, ICALP.

[44]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[45]  Ricardo Menchaca-Mendez,et al.  When a worse approximation factor gives better performance: a 3-approximation algorithm for the vertex k-center problem , 2017, J. Heuristics.

[46]  David B. Shmoys,et al.  A unified approach to approximation algorithms for bottleneck problems , 1986, JACM.

[47]  Pranjal Awasthi,et al.  Fair k-Center Clustering for Data Summarization , 2019, ICML.

[48]  Martine Labbé,et al.  A New Formulation and Resolution Method for the p-Center Problem , 2004, INFORMS J. Comput..