Fair Correlation Clustering

In this paper, we study correlation clustering under fairness constraints. Fair variants of $k$-median and $k$-center clustering have been studied recently, and approximation algorithms using a notion called fairlet decomposition have been proposed. We obtain approximation algorithms for fair correlation clustering under several important types of fairness constraints. Our results hinge on obtaining a fairlet decomposition for correlation clustering by introducing a novel combinatorial optimization problem. We define a fairlet decomposition with cost similar to the $k$-median cost and this allows us to obtain approximation algorithms for a wide range of fairness constraints. We complement our theoretical results with an in-depth analysis of our algorithms on real graphs where we show that fair solutions to correlation clustering can be obtained with limited increase in cost compared to the state-of-the-art (unfair) algorithms.

[1]  David B. Shmoys,et al.  A Best Possible Heuristic for the k-Center Problem , 1985, Math. Oper. Res..

[2]  Venkatesan Guruswami,et al.  Clustering with qualitative information , 2005, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[3]  Jun Sakuma,et al.  Fairness-Aware Classifier with Prejudice Remover Regularizer , 2012, ECML/PKDD.

[4]  Deeparnab Chakrabarty,et al.  Fair Algorithms for Clustering , 2019, NeurIPS.

[5]  Miao Jiang,et al.  Achieving Fairness in Determining Medicaid Eligibility through Fairgroup Construction , 2019, ArXiv.

[6]  Yong Zhang,et al.  Improved Algorithms for Bicluster Editing , 2008, TAMC.

[7]  Shi Li,et al.  Approximating k-Median via Pseudo-Approximation , 2016, SIAM J. Comput..

[8]  Shai Ben-David,et al.  Semi-supervised clustering for de-duplication , 2018, AISTATS.

[9]  Tselil Schramm,et al.  Near Optimal LP Rounding Algorithm for CorrelationClustering on Complete and Complete k-partite Graphs , 2014, STOC.

[10]  David M. Mount,et al.  A local search approximation algorithm for k-means clustering , 2002, SCG '02.

[11]  Edo Liberty,et al.  Correlation clustering: from theory to practice , 2014, KDD.

[12]  Julia Stoyanovich,et al.  Measuring Fairness in Ranked Outputs , 2016, SSDBM.

[13]  Sariel Har-Peled,et al.  Near Neighbor: Who is the Fairest of Them All? , 2019, NeurIPS.

[14]  Alexander Schrijver,et al.  Combinatorial optimization. Polyhedra and efficiency. , 2003 .

[15]  Krzysztof Onak,et al.  Scalable Fair Clustering , 2019, ICML.

[16]  Sara Ahmadian,et al.  Clustering without Over-Representation , 2019, KDD.

[17]  Toon Calders,et al.  Three naive Bayes approaches for discrimination-free classification , 2010, Data Mining and Knowledge Discovery.

[18]  Olgica Milenkovic,et al.  Correlation Clustering and Biclustering With Locally Bounded Errors , 2015, IEEE Transactions on Information Theory.

[19]  Ismail Ben Ayed,et al.  Variational Fair Clustering , 2019 .

[20]  Samir Khuller,et al.  Min-Max Correlation Clustering via MultiCut , 2019, IPCO.

[21]  Kamesh Munagala,et al.  Proportionally Fair Clustering , 2019, ICML.

[22]  Silvio Lattanzi,et al.  Matroids, Matchings, and Fairness , 2019, AISTATS.

[23]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[24]  Benjamin Fish,et al.  A Confidence-Based Approach for Balancing Fairness and Accuracy , 2016, SDM.

[25]  Nisheeth K. Vishnoi,et al.  Coresets for Clustering with Fairness Constraints , 2019, NeurIPS.

[26]  Melanie Schmidt,et al.  Privacy preserving clustering with constraints , 2018, ICALP.

[27]  Aaron Roth,et al.  Fairness in Learning: Classic and Contextual Bandits , 2016, NIPS.

[28]  Nisheeth K. Vishnoi,et al.  Ranking with Fairness Constraints , 2017, ICALP.

[29]  Jure Leskovec,et al.  The dynamics of viral marketing , 2005, EC '06.

[30]  Jian Ma,et al.  A new correlation clustering method for cancer mutation analysis , 2016, Bioinform..

[31]  Dimitris S. Papailiopoulos,et al.  Parallel Correlation Clustering on Big Graphs , 2015, NIPS.

[32]  Konstantin Makarychev,et al.  Improved algorithms for Correlation Clustering with local objectives , 2019, ArXiv.

[33]  Samir Khuller,et al.  On the cost of essentially fair clusterings , 2018, APPROX-RANDOM.

[34]  Nisheeth K. Vishnoi,et al.  Multiwinner Voting with Fairness Constraints , 2017, IJCAI.

[35]  Anthony Wirth,et al.  Correlation Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[36]  Jian Li,et al.  Clustering with Diversity , 2010, ICALP.

[37]  Eric Granger,et al.  Clustering with Fairness Constraints: A Flexible and Scalable Approach , 2019, ArXiv.

[38]  Nir Ailon,et al.  Aggregating inconsistent information: Ranking and clustering , 2008 .

[39]  Ron Shamir,et al.  Clustering Gene Expression Patterns , 1999, J. Comput. Biol..

[40]  Pranjal Awasthi,et al.  Fair k-Center Clustering for Data Summarization , 2019, ICML.

[41]  David F. Gleich,et al.  A Correlation Clustering Framework for Community Detection , 2018, WWW.

[42]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[43]  Jun Sakuma,et al.  Fairness-aware Learning through Regularization Approach , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[44]  Roy Schwartz,et al.  Local Guarantees in Graph Cuts and Clustering , 2017, IPCO.

[45]  Sudipto Guha,et al.  Correlation Clustering in Data Streams , 2015, ICML.

[46]  Amos Fiat,et al.  Correlation clustering in general weighted graphs , 2006, Theor. Comput. Sci..

[47]  Ola Svensson,et al.  Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[48]  Silvio Lattanzi,et al.  Fair Clustering Through Fairlets , 2018, NIPS.

[49]  Steven Skiena,et al.  Integrating microarray data by consensus clustering , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[50]  Vahab Mirrokni,et al.  Variance Reduction in Bipartite Experiments through Correlation Clustering , 2019, NeurIPS.

[51]  Carlos Eduardo Scheidegger,et al.  Certifying and Removing Disparate Impact , 2014, KDD.

[52]  Pranjal Awasthi,et al.  Guarantees for Spectral Clustering with Fairness Constraints , 2019, ICML.

[53]  Arindam Biswas,et al.  Summarizing User-generated Textual Content: Motivation and Methods for Fairness in Algorithmic Summaries , 2018, Proc. ACM Hum. Comput. Interact..

[54]  Christian Sohler,et al.  Fair Coresets and Streaming Algorithms for Fair k-Means Clustering , 2018, ArXiv.

[55]  Christopher Jung,et al.  Fair Algorithms for Learning in Allocation Problems , 2018, FAT.

[56]  Nicolò Cesa-Bianchi,et al.  Correlation Clustering with Adaptive Similarity Queries , 2019, NeurIPS.

[57]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..