A New Notion of Individually Fair Clustering: $\alpha$-Equitable $k$-Center

Clustering is a fundamental problem in unsupervised machine learning, and fair variants of it have recently received significant attention. In this work we introduce a novel definition of fairness for clustering problems. Specifically, in our model each point j has a set of other points Sj that it perceives as similar to itself, and it feels that it is fairly treated, if the quality of service it receives in the solution is α-close to that of the points in Sj . We begin our study by answering questions regarding the structure of the problem, namely for what values of α the problem is well-defined, and what the behavior of the Price of Fairness (PoF) for it is. For the well-defined region of α, we provide efficient and easily implementable approximation algorithms for the k-center objective, which in certain cases also enjoy bounded PoF guarantees. We finally complement our analysis by an extensive suite of experiments that validates the effectiveness of our theoretical results.

[1]  Samir Khuller,et al.  A Pairwise Fair and Community-preserving Approach to k-Center Clustering , 2020, ICML.

[2]  Maria-Florina Balcan,et al.  Envy-Free Classification , 2018, NeurIPS.

[3]  Yang Liu,et al.  Distributional Individual Fairness in Clustering , 2020, ArXiv.

[4]  Beata Strack,et al.  Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records , 2014, BioMed research international.

[5]  Sepideh Mahabadi,et al.  (Individual) Fairness for k-Clustering , 2020, ICML.

[6]  David B. Shmoys,et al.  A unified approach to approximation algorithms for bottleneck problems , 1986, JACM.

[7]  Bo Thiesson,et al.  The Learning-Curve Sampling Method Applied to Model-Based Clustering , 2002, J. Mach. Learn. Res..

[8]  Pranjal Awasthi,et al.  Fair k-Center Clustering for Data Summarization , 2019, ICML.

[9]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[10]  Sara Ahmadian,et al.  Clustering without Over-Representation , 2019, KDD.

[11]  Pranjal Awasthi,et al.  A Notion of Individual Fairness for Clustering , 2020, ArXiv.

[12]  Aditya Bhaskara,et al.  Fair Clustering via Equitable Group Representations , 2021, FAccT.

[13]  Debmalya Mandal,et al.  Feature-based Individual Fairness in k-Clustering , 2021, ArXiv.

[14]  Samir Khuller,et al.  The Capacitated K-Center Problem , 2000, SIAM J. Discret. Math..

[15]  John P. Dickerson,et al.  Probabilistic Fair Clustering , 2020, NeurIPS.

[16]  I-Cheng Yeh,et al.  The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients , 2009, Expert Syst. Appl..

[17]  Aravind Srinivasan,et al.  Fairness, Semi-Supervised Learning, and More: A General Framework for Clustering with Stochastic Pairwise Constraints , 2021, AAAI.

[18]  Dimitris Bertsimas,et al.  The Price of Fairness , 2011, Oper. Res..

[19]  Christopher Jung,et al.  A Center in Your Neighborhood: Fairness in Facility Location , 2019, ArXiv.

[20]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[21]  Ioannis Caragiannis,et al.  The Efficiency of Fair Division , 2009, Theory of Computing Systems.

[22]  Nisheeth K. Vishnoi,et al.  Coresets for Clustering with Fairness Constraints , 2019, NeurIPS.

[23]  Paulo Cortez,et al.  A data-driven approach to predict the success of bank telemarketing , 2014, Decis. Support Syst..

[24]  David B. Shmoys,et al.  A Best Possible Heuristic for the k-Center Problem , 1985, Math. Oper. Res..

[25]  Kamesh Munagala,et al.  Proportionally Fair Clustering , 2019, ICML.

[26]  Michael Carl Tschantz,et al.  Automated Experiments on Ad Privacy Settings: A Tale of Opacity, Choice, and Discrimination , 2014, ArXiv.

[27]  Deeparnab Chakrabarty,et al.  Fair Algorithms for Clustering , 2019, NeurIPS.

[28]  Silvio Lattanzi,et al.  Fair Clustering Through Fairlets , 2018, NIPS.

[29]  Krzysztof Onak,et al.  Scalable Fair Clustering , 2019, ICML.

[30]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[31]  H. Varian Equity, Envy and Efficiency , 1974 .

[32]  S HochbaumDorit,et al.  A Best Possible Heuristic for the k-Center Problem , 1985 .

[33]  Samir Khuller,et al.  On the cost of essentially fair clusterings , 2018, APPROX-RANDOM.