论文信息 - Socially Fair k-Means Clustering

Socially Fair k-Means Clustering

We show that the popular k-means clustering algorithm (Lloyd's heuristic), used for a variety of scientific data, can result in outcomes that are unfavorable to subgroups of data (e.g., demographic groups). Such biased clusterings can have deleterious implications for human-centric applications such as resource allocation. We present a fair k-means objective and algorithm to choose cluster centers that provide equitable costs for different groups. The algorithm, Fair-Lloyd, is a modification of Lloyd's heuristic for k-means, inheriting its simplicity, efficiency, and stability. In comparison with standard Lloyd's, we find that on benchmark datasets, Fair-Lloyd exhibits unbiased performance by ensuring that all groups have equal costs in the output k-clustering, while incurring a negligible increase in running time, thus making it a viable fair option wherever k-means is currently used.

[1] Pranjal Awasthi,et al. Improved Spectral-Norm Bounds for Clustering , 2012, APPROX-RANDOM.

[2] Shokri Z. Selim,et al. K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3] David M. Mount,et al. A local search approximation algorithm for k-means clustering , 2002, SCG '02.

[4] Anil K. Jain. Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[5] R. Ostrovsky,et al. The Effectiveness of Lloyd-Type Methods for the k-Means Problem , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[6] Ola Svensson,et al. Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[7] Deeparnab Chakrabarty,et al. Fair Algorithms for Clustering , 2019, NeurIPS.

[8] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[9] Ravishankar Krishnaswamy,et al. The Hardness of Approximation of Euclidean k-Means , 2015, SoCG.

[10] Shyam Varan Nath,et al. Crime Pattern Detection Using Data Mining , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology Workshops.

[11] Amit Kumar,et al. Clustering with Spectral Norm and the k-Means Algorithm , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[12] COMPAS Risk Scales : Demonstrating Accuracy Equity and Predictive Parity Performance of the COMPAS Risk Scales in Broward County , 2016 .

[13] Nisheeth K. Vishnoi,et al. Ranking with Fairness Constraints , 2017, ICALP.

[14] Christian Sohler,et al. Fair Coresets and Streaming Algorithms for Fair k-Means Clustering , 2018, ArXiv.

[15] Krishna P. Gummadi,et al. Fairness Constraints: Mechanisms for Fair Classification , 2015, AISTATS.

[16] M. Narasimha Murty,et al. Genetic K-means algorithm , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[17] Tony H. Grubesic,et al. On The Application of Fuzzy Clustering for Crime Hot Spot Detection , 2006 .

[18] Nisheeth K. Vishnoi,et al. Coresets for Clustering with Fairness Constraints , 2019, NeurIPS.

[19] Kamesh Munagala,et al. Proportionally Fair Clustering , 2019, ICML.

[20] Pranjal Awasthi,et al. Guarantees for Spectral Clustering with Fairness Constraints , 2019, ICML.

[21] Siddheswar Ray,et al. Determination of Number of Clusters in K-Means Clustering and Application in Colour Image Segmentation , 2000 .

[22] Toniann Pitassi,et al. Fairness through awareness , 2011, ITCS '12.