Fast and Differentially Private Fair Clustering

This study presents the first differentially private and fair clustering method, built on the recently proposed density-based fair clustering approach. The method addresses the limitations of fair clustering algorithms that necessitate the use of sensitive personal information during training or inference phases. Two novel solutions, the Gaussian mixture density function and Voronoi cell, are proposed to enhance the method's performance in terms of privacy, fairness, and utility compared to previous methods. The experimental results on both synthetic and real-world data confirm the compatibility of the proposed method with differential privacy, achieving a better fairness-utility trade-off than existing methods when privacy is not considered. Moreover, the proposed method requires significantly less computation time, being at least 3.7 times faster than the state-of-the-art.

[1]  Jaewoo Lee,et al.  Improving the Utility of Differentially Private Clustering through Dynamical Processing , 2023, ArXiv.

[2]  Clayton D. Scott,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Junyoung Byun,et al.  Fair Clustering with Fair Correspondence Distribution , 2021, Information Sciences.

[4]  Pascal Van Hentenryck,et al.  Differentially Private and Fair Deep Learning: A Lagrangian Dual Approach , 2020, AAAI.

[5]  Benjamin Moseley,et al.  Fair Hierarchical Clustering , 2020, NeurIPS.

[6]  Hongfu Liu,et al.  Deep Fair Clustering for Visual Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Ed H. Chi,et al.  Fairness without Demographics through Adversarially Reweighted Learning , 2020, NeurIPS.

[8]  Mesrob I. Ohannessian,et al.  Fair Learning with Private Demographic Data , 2020, ICML.

[9]  Sara Ahmadian,et al.  Fair Correlation Clustering , 2020, AISTATS.

[10]  Kristina Lerman,et al.  A Survey on Bias and Fairness in Machine Learning , 2019, ACM Comput. Surv..

[11]  Nisheeth K. Vishnoi,et al.  Coresets for Clustering with Fairness Constraints , 2019, NeurIPS.

[12]  Varun Gupta,et al.  On the Compatibility of Privacy and Fairness , 2019, UMAP.

[13]  Max Tegmark,et al.  The role of artificial intelligence in achieving the Sustainable Development Goals , 2019, Nature Communications.

[14]  Krzysztof Onak,et al.  Scalable Fair Clustering , 2019, ICML.

[15]  Deeparnab Chakrabarty,et al.  Fair Algorithms for Clustering , 2019, NeurIPS.

[16]  Aaron Roth,et al.  Differentially Private Fair Learning , 2018, ICML.

[17]  Ayanna M. Howard,et al.  The Ugly Truth About Ourselves and Our Robot Creations: The Problem of Bias and Social Inequity , 2017, Science and Engineering Ethics.

[18]  John Langford,et al.  A Reductions Approach to Fair Classification , 2018, ICML.

[19]  Silvio Lattanzi,et al.  Fair Clustering Through Fairlets , 2018, NIPS.

[20]  Seth Neel,et al.  A Convex Framework for Fair Regression , 2017, ArXiv.

[21]  James R. Foulds,et al.  DP-EM: Differentially Private Expectation Maximization , 2016, AISTATS.

[22]  Krishna P. Gummadi,et al.  Fairness Constraints: Mechanisms for Fair Classification , 2015, AISTATS.

[23]  Kyoungok Kim,et al.  Voronoi Cell-Based Clustering Using a Kernel Support , 2015, IEEE Transactions on Knowledge and Data Engineering.

[24]  Paulo Cortez,et al.  A data-driven approach to predict the success of bank telemarketing , 2014, Decis. Support Syst..

[25]  Beata Strack,et al.  Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records , 2014, BioMed research international.

[26]  Daewon Lee,et al.  Dynamic Characterization of Cluster Structures for Robust and Inductive Support Vector Clustering , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[28]  Susan T. Fiske,et al.  Behavioral realism in employment discrimination law: Implicit bias and disparate treatment , 2006 .

[29]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[30]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[31]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[32]  B. C. Brookes,et al.  Information Sciences , 2020, Cognitive Skills You Need for the 21st Century.

[33]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Volker Tresp,et al.  Nonlinear Markov Networks for Continuous Variables , 1997, NIPS.

[35]  John C. Platt,et al.  Fast Training of Support Vector Machines using Sequential Minimal Optimization , 2000 .

[36]  J. Wyatt Decision support systems. , 2000, Journal of the Royal Society of Medicine.

[37]  Pattern Recognition Letters , 1995 .

[38]  Ronald D. Bonnell,et al.  IEEE Transactions on Knowledge and Data Engineering , 2022 .