Fair Clustering with Fair Correspondence Distribution

Abstract In recent years, the issue of fairness has become important in the field of machine learning. In clustering problems, fairness is defined in terms of consistency in that the balance ratio of data with different sensitive attribute values remains constant for each cluster. Fairness problems are important in real-world applications, for example, when the recommendation system provides targeted advertisements or job offers based on the clustering result of candidates, the minority group may not get the same level of opportunity as the majority group if the clustering result is unfair. In this study, we propose a novel distribution-based fair clustering approach. Considering a distribution in which the sample is biased by society, we try to find clusters from a fair correspondence distribution. Our method uses the support vector method and a dynamical system to comprehensively divide the entire data space into atomic cells before reassembling them fairly to form the clusters. Theoretical results derive the upper bound of the generalization error of the corresponding clustering function in the fair correspondence distribution when atomic cells are connected fairly, allowing us to present an algorithm to achieve fairness. Experimental results show that our algorithm beneficially increases fairness while reducing computation time for various datasets.

[1]  Marek Gagolewski,et al.  Genie+OWA: Robustifying hierarchical clustering with OWA-based linkages , 2020, Inf. Sci..

[2]  Pradipta Maji,et al.  Stomped-t: A novel probability distribution for rough-probabilistic clustering , 2017, Inf. Sci..

[3]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[4]  Krishna P. Gummadi,et al.  From Parity to Preference-based Notions of Fairness in Classification , 2017, NIPS.

[5]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[6]  Kyoungok Kim,et al.  Voronoi Cell-Based Clustering Using a Kernel Support , 2015, IEEE Transactions on Knowledge and Data Engineering.

[7]  Kyu-Hwan Jung,et al.  Dynamic pattern denoising method using multi-basin system with kernels , 2011, Pattern Recognit..

[8]  Ping Fu,et al.  Graph-based Clustering with Spatiotemporal Contour Energy for Video Salient Object Detection , 2019, J. Inf. Hiding Multim. Signal Process..

[9]  Severino F. Galán,et al.  Comparative evaluation of region query strategies for DBSCAN clustering , 2019, Inf. Sci..

[10]  Nisheeth K. Vishnoi,et al.  Coresets for Clustering with Fairness Constraints , 2019, NeurIPS.

[11]  Daewon Lee,et al.  Domain described support vector classifier for multi-classification problems , 2007, Pattern Recognit..

[12]  Daewon Lee,et al.  Dynamic Dissimilarity Measure for Support-Based Clustering , 2010, IEEE Transactions on Knowledge and Data Engineering.

[13]  Krzysztof Onak,et al.  Scalable Fair Clustering , 2019, ICML.

[14]  Jeng-Shyang Pan,et al.  Constrained Ant Colony Optimization for Data Clustering , 2004, PRICAI.

[15]  Hava T. Siegelmann,et al.  Support Vector Clustering , 2002, J. Mach. Learn. Res..

[16]  Jaewook Lee,et al.  Joint Transfer of Model Knowledge and Fairness Over Domains Using Wasserstein Distance , 2020, IEEE Access.

[17]  A. Fisher,et al.  Brain Sex Differences Related to Gender Identity Development: Genes or Hormones? , 2020, International journal of molecular sciences.

[18]  Daewon Lee,et al.  An improved cluster labeling method for support vector clustering , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Andrew D. Selbst,et al.  Big Data's Disparate Impact , 2016 .

[20]  Carlos Eduardo Scheidegger,et al.  Certifying and Removing Disparate Impact , 2014, KDD.

[21]  Silvio Lattanzi,et al.  Fair Clustering Through Fairlets , 2018, NIPS.

[22]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[23]  Daewon Lee,et al.  Dynamic Characterization of Cluster Structures for Robust and Inductive Support Vector Clustering , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.