Fair Correlation Clustering

1 In this paper, we study correlation clustering under fairness constraints. Fair variants of k-median and k-center clustering have been studied recently, and approximation algorithms using a notion called fairlet decomposition have been proposed. We obtain approximation algorithms for fair correlation clustering under several important types of fairness constraints. Our results hinge on obtaining a fairlet decomposition for correlation clustering by introducing a novel combinatorial optimization problem. We define a fairlet decomposition with cost similar to the k-median cost and this allows us to obtain approximation algorithms for a wide range of fairness constraints. We complement our theoretical results with an in-depth analysis of our algorithms on real graphs where we show that fair solutions to correlation clustering can be obtained with limited increase in cost compared to the state-of-the-art (unfair) algorithms.

[1]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[2]  Ola Svensson,et al.  Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[3]  Silvio Lattanzi,et al.  Matroids, Matchings, and Fairness , 2019, AISTATS.

[4]  Silvio Lattanzi,et al.  Fair Clustering Through Fairlets , 2018, NIPS.

[5]  Deeparnab Chakrabarty,et al.  Fair Algorithms for Clustering , 2019, NeurIPS.

[6]  Aravind Srinivasan,et al.  An Improved Approximation for k-Median and Positive Correlation in Budgeted Optimization , 2014, SODA.

[7]  Miao Jiang,et al.  Achieving Fairness in Determining Medicaid Eligibility through Fairgroup Construction , 2019, ArXiv.

[8]  Vahab Mirrokni,et al.  Variance Reduction in Bipartite Experiments through Correlation Clustering , 2019, NeurIPS.

[9]  Melanie Schmidt,et al.  Privacy preserving clustering with constraints , 2018, ICALP.

[10]  Nisheeth K. Vishnoi,et al.  Ranking with Fairness Constraints , 2017, ICALP.

[11]  Dimitris S. Papailiopoulos,et al.  Parallel Correlation Clustering on Big Graphs , 2015, NIPS.

[12]  Samir Khuller,et al.  On the cost of essentially fair clusterings , 2018, APPROX-RANDOM.

[13]  Nisheeth K. Vishnoi,et al.  Multiwinner Voting with Fairness Constraints , 2017, IJCAI.

[14]  Alexander Schrijver,et al.  Combinatorial optimization. Polyhedra and efficiency. , 2003 .

[15]  Aaron Roth,et al.  Fairness in Learning: Classic and Contextual Bandits , 2016, NIPS.

[16]  Jure Leskovec,et al.  The dynamics of viral marketing , 2005, EC '06.

[17]  Kamesh Munagala,et al.  Proportionally Fair Clustering , 2019, ICML.

[18]  Jun Sakuma,et al.  Fairness-Aware Classifier with Prejudice Remover Regularizer , 2012, ECML/PKDD.

[19]  Abdulmecit Gungor Fifty Victorian Era Novelists Authorship Attribution Data , 2018 .

[20]  Anthony Wirth,et al.  Correlation Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[21]  Christopher Jung,et al.  Fair Algorithms for Learning in Allocation Problems , 2018, FAT.

[22]  Nicolò Cesa-Bianchi,et al.  Correlation Clustering with Adaptive Similarity Queries , 2019, NeurIPS.

[23]  Jian Li,et al.  Clustering with Diversity , 2010, ICALP.

[24]  Eric Granger,et al.  Clustering with Fairness Constraints: A Flexible and Scalable Approach , 2019, ArXiv.

[25]  Carlos Eduardo Scheidegger,et al.  Certifying and Removing Disparate Impact , 2014, KDD.

[26]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[27]  Pranjal Awasthi,et al.  Guarantees for Spectral Clustering with Fairness Constraints , 2019, ICML.

[28]  Konstantin Makarychev,et al.  Improved algorithms for Correlation Clustering with local objectives , 2019, ArXiv.

[29]  Benjamin Fish,et al.  A Confidence-Based Approach for Balancing Fairness and Accuracy , 2016, SDM.

[30]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[31]  Arindam Biswas,et al.  Summarizing User-generated Textual Content: Motivation and Methods for Fairness in Algorithmic Summaries , 2018, Proc. ACM Hum. Comput. Interact..

[32]  Nisheeth K. Vishnoi,et al.  Coresets for Clustering with Fairness Constraints , 2019, NeurIPS.

[33]  David M. Mount,et al.  A local search approximation algorithm for k-means clustering , 2002, SCG '02.

[34]  Jonas Keinholz,et al.  Matroids , 2018, Arch. Formal Proofs.

[35]  Sariel Har-Peled,et al.  Near Neighbor: Who is the Fairest of Them All? , 2019, NeurIPS.

[36]  Pranjal Awasthi,et al.  Fair k-Center Clustering for Data Summarization , 2019, ICML.

[37]  Venkatesan Guruswami,et al.  Clustering with qualitative information , 2005, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[38]  Krzysztof Onak,et al.  Scalable Fair Clustering , 2019, ICML.

[39]  Sara Ahmadian,et al.  Clustering without Over-Representation , 2019, KDD.

[40]  Jun Sakuma,et al.  Fairness-aware Learning through Regularization Approach , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[41]  Sudipto Guha,et al.  Correlation Clustering in Data Streams , 2015, ICML.

[42]  Amos Fiat,et al.  Correlation clustering in general weighted graphs , 2006, Theor. Comput. Sci..

[43]  Shi Li,et al.  Approximating k-median via pseudo-approximation , 2012, STOC '13.

[44]  David B. Shmoys,et al.  A Best Possible Heuristic for the k-Center Problem , 1985, Math. Oper. Res..

[45]  Nir Ailon,et al.  Aggregating inconsistent information: Ranking and clustering , 2008 .

[46]  Shai Ben-David,et al.  Semi-supervised clustering for de-duplication , 2018, AISTATS.

[47]  Tselil Schramm,et al.  Near Optimal LP Rounding Algorithm for CorrelationClustering on Complete and Complete k-partite Graphs , 2014, STOC.

[48]  Julia Stoyanovich,et al.  Measuring Fairness in Ranked Outputs , 2016, SSDBM.

[49]  Toon Calders,et al.  Three naive Bayes approaches for discrimination-free classification , 2010, Data Mining and Knowledge Discovery.