Representativity Fairness in Clustering

Incorporating fairness constructs into machine learning algorithms is a topic of much societal importance and recent interest. Clustering, a fundamental task in unsupervised learning that manifests across a number of web data scenarios, has also been subject of attention within fair ML research. In this paper, we develop a novel notion of fairness in clustering, called representativity fairness. Representativity fairness is motivated by the need to alleviate disparity across objects’ proximity to their assigned cluster representatives, to aid fairer decision making. We illustrate the importance of representativity fairness in real-world decision making scenarios involving clustering and provide ways of quantifying objects’ representativity and fairness over it. We develop a new clustering formulation, RFKM, that targets to optimize for representativity fairness along with clustering quality. Inspired by the K-Means framework, RFKM incorporates novel loss terms to formulate an objective function. The RFKM objective and optimization approach guides it towards clustering configurations that yield higher representativity fairness. Through an empirical evaluation over a variety of public datasets, we establish the effectiveness of our method. We illustrate that we are able to significantly improve representativity fairness at only marginal impact to clustering quality.

[1]  Savitha Sam Abraham,et al.  Fairness in Clustering with Multiple Sensitive Attributes , 2019, EDBT.

[2]  Sara Ahmadian,et al.  Clustering without Over-Representation , 2019, KDD.

[3]  Nisheeth K. Vishnoi,et al.  Stable and Fair Classification , 2019, ICML.

[4]  Raj Jain,et al.  A Quantitative Measure Of Fairness And Discrimination For Resource Allocation In Shared Computer Systems , 1998, ArXiv.

[5]  Kamesh Munagala,et al.  Proportionally Fair Clustering , 2019, ICML.

[6]  Jie Zhao,et al.  A review of moving object trajectory clustering algorithms , 2016, Artificial Intelligence Review.

[7]  Reuben Binns,et al.  On the apparent conflict between individual and group fairness , 2019, FAT*.

[8]  Eric Granger,et al.  Clustering with Fairness Constraints: A Flexible and Scalable Approach , 2019, ArXiv.

[9]  Krishna P. Gummadi,et al.  Incremental Fairness in Two-Sided Market Platforms: On Smoothly Updating Recommendations , 2019, AAAI 2020.

[10]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[11]  Ricardo Baeza-Yates,et al.  FA*IR: A Fair Top-k Ranking Algorithm , 2017, CIKM.

[12]  Ismail Ben Ayed,et al.  Variational Fair Clustering , 2019 .

[13]  Otto Bird The Idea of Justice , 1967 .

[14]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[15]  Pranjal Awasthi,et al.  Fair k-Center Clustering for Data Summarization , 2019, ICML.

[16]  Krishna P. Gummadi,et al.  Incremental Fairness in Two-Sided Market Platforms: On Updating Recommendations Fairly , 2019, ArXiv.

[17]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[18]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[19]  Deeparnab Chakrabarty,et al.  Fair Algorithms for Clustering , 2019, NeurIPS.

[20]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[21]  Silvio Lattanzi,et al.  Fair Clustering Through Fairlets , 2018, NIPS.

[22]  Matt Olfat,et al.  Convex Formulations for Fair Principal Component Analysis , 2018, AAAI.

[23]  Richard J. Arneson Luck Egalitarianism Interpretated and Defended , 2004 .

[24]  Andrea Vattani The hardness of k-means clustering in the plane , 2010 .

[25]  Dan W. Brockt,et al.  The Theory of Justice , 2017 .

[26]  Krishna P. Gummadi,et al.  Fairness Constraints: Mechanisms for Fair Classification , 2015, AISTATS.

[27]  Jon M. Kleinberg,et al.  Inherent Trade-Offs in the Fair Determination of Risk Scores , 2016, ITCS.

[28]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .