Differentially-Private Sublinear-Time Clustering

Clustering is an essential primitive in unsupervised machine learning. We bring forth the problem of sublinear-time differentially-private clustering as a natural and well-motivated direction of research. We combine the $k$-means and $k$-median sublinear-time results of Mishra et al. (SODA, 2001) and of Czumaj and Sohler (Rand. Struct. and Algorithms, 2007) with recent results on private clustering of Balcan et al. (ICML 2017), Gupta et al. (SODA, 2010) and Ghazi et al. (NeurIPS, 2020) to obtain sublinear-time private $k$-means and $k$-median algorithms via subsampling. We also investigate the privacy benefits of subsampling for group privacy.

[1]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[2]  Dan Feldman,et al.  Coresets for Differentially Private K-Means Clustering and Applications to Privacy in Mobile Sensor Networks , 2017, 2017 16th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).

[3]  Úlfar Erlingsson,et al.  Prochlo: Strong Privacy for Analytics in the Crowd , 2017, SOSP.

[4]  Kobbi Nissim,et al.  Clustering Algorithms for the Centralized and Local Models , 2017, ALT.

[5]  Ke Chen,et al.  On k-Median clustering in high dimensions , 2006, SODA '06.

[6]  Zhiyi Huang,et al.  Optimal Differentially Private Algorithms for k-Means Clustering , 2018, PODS.

[7]  Ke Chen,et al.  A constant factor approximation algorithm for k-median clustering with outliers , 2008, SODA '08.

[8]  Sudipto Guha,et al.  A constant-factor approximation algorithm for the k-median problem (extended abstract) , 1999, STOC '99.

[9]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[10]  Badih Ghazi,et al.  Differentially Private Clustering: Tight Approximation Ratios , 2020, NeurIPS.

[11]  Shi Li,et al.  Approximating k-median via pseudo-approximation , 2012, STOC '13.

[12]  Artur Czumaj,et al.  Sublinear‐time approximation algorithms for clustering via random sampling , 2007, Random Struct. Algorithms.

[13]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[14]  Elisa Bertino,et al.  Differentially Private K-Means Clustering , 2015, CODASPY.

[15]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[16]  Uri Stemmer,et al.  Private k-Means Clustering with Stability Assumptions , 2020, AISTATS.

[17]  Sariel Har-Peled,et al.  On coresets for k-means and k-median clustering , 2004, STOC '04.

[18]  Uri Stemmer Locally Private k-Means Clustering , 2020, SODA.

[19]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[20]  J. Lamperti ON CONVERGENCE OF STOCHASTIC PROCESSES , 1962 .

[21]  Aaron Roth,et al.  Differentially private combinatorial optimization , 2009, SODA '10.

[22]  Danfeng Zhang,et al.  Guidelines for Implementing and Auditing Differentially Private Systems , 2020, ArXiv.

[23]  Gilles Barthe,et al.  Privacy Amplification by Subsampling: Tight Analyses via Couplings and Divergences , 2018, NeurIPS.

[24]  Maria-Florina Balcan,et al.  Differentially Private Clustering in High-Dimensional Euclidean Spaces , 2017, ICML.

[25]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[26]  David M. Mount,et al.  A local search approximation algorithm for k-means clustering , 2002, SCG '02.

[27]  Leonard Pitt,et al.  Sublinear time approximate clustering , 2001, SODA '01.

[28]  R. Ostrovsky,et al.  The Effectiveness of Lloyd-Type Methods for the k-Means Problem , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[29]  Ola Svensson,et al.  Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[30]  Ninghui Li,et al.  On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy , 2011, ASIACCS '12.

[31]  Haim Kaplan,et al.  Differentially Private k-Means with Constant Multiplicative Error , 2018, NeurIPS.

[32]  M MountDavid,et al.  A local search approximation algorithm for k-means clustering , 2004 .

[33]  Kamesh Munagala,et al.  Local search heuristic for k-median and facility location problems , 2001, STOC '01.

[34]  Avrim Blum,et al.  Stability Yields a PTAS for k-Median and k-Means Clustering , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.