Differentially Private Clustering: Tight Approximation Ratios

We study the task of differentially private clustering. For several basic clustering problems, including Euclidean DensestBall, 1-Cluster, k-means, and k-median, we give efficient differentially private algorithms that achieve essentially the same approximation ratios as those that can be obtained by any non-private algorithm, while incurring only small additive errors. This improves upon existing efficient algorithms that only achieve some large constant approximation factors. Our results also imply an improved algorithm for the Sample and Aggregate privacy framework. Furthermore, we show that one of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.

[1]  Shi Li,et al.  Approximating k-Median via Pseudo-Approximation , 2016, SIAM J. Comput..

[2]  Amit Kumar,et al.  A simple linear time (1 + /spl epsiv/)-approximation algorithm for k-means clustering in any dimensions , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[3]  Zhiyi Huang,et al.  Optimal Differentially Private Algorithms for k-Means Clustering , 2018, PODS.

[4]  Marek Karpinski,et al.  Approximation schemes for clustering problems , 2003, STOC '03.

[5]  Joydeep Ghosh,et al.  Data Clustering Algorithms And Applications , 2013 .

[6]  Vladimir Shenmaier,et al.  Complexity and approximation of the Smallest k-Enclosing Ball problem , 2015, Eur. J. Comb..

[7]  Euiwoong Lee,et al.  Improved and simplified inapproximability for k-means , 2015, Inf. Process. Lett..

[8]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[9]  Elisa Bertino,et al.  Differentially Private K-Means Clustering , 2015, CODASPY.

[10]  C. A. Rogers Lattice coverings of space , 1959 .

[11]  Shai Shalev-Shwartz,et al.  Agnostically Learning Halfspaces with Margin Errors , 2009 .

[12]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[13]  Vijay V. Vazirani,et al.  Approximation algorithms for metric facility location and k-Median problems using the primal-dual schema and Lagrangian relaxation , 2001, JACM.

[14]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[15]  Michael Langberg,et al.  A unified framework for approximating and clustering data , 2011, STOC '11.

[16]  Shafi Goldwasser,et al.  Complexity of lattice problems - a cryptographic perspective , 2002, The Kluwer international series in engineering and computer science.

[17]  Elaine Shi,et al.  GUPT: privacy preserving data analysis made easy , 2012, SIGMOD Conference.

[18]  Haim Kaplan,et al.  Differentially Private k-Means with Constant Multiplicative Error , 2018, NeurIPS.

[19]  Michael R. Fellows,et al.  Fundamentals of Parameterized Complexity , 2013 .

[20]  Aaron Roth,et al.  Differentially private combinatorial optimization , 2009, SODA '10.

[21]  Thomas Steinke,et al.  Tight Lower Bounds for Differentially Private Selection , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[22]  Venkatesan Guruswami,et al.  Algorithmic Results in List Decoding , 2006, Found. Trends Theor. Comput. Sci..

[23]  Jonathan Ullman,et al.  Tight Lower Bounds for Locally Differentially Private Selection , 2018, ArXiv.

[24]  M. Szegedy,et al.  Quantum Walk Based Search Algorithms , 2008, TAMC.

[25]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[26]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[27]  Daniele Micciancio Almost Perfect Lattices, the Covering Radius Problem, and Applications to Ajtai's Connection Factor , 2003, SIAM J. Comput..

[28]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[29]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[30]  Shai Shalev-Shwartz,et al.  Learning Halfspaces with the Zero-One Loss: Time-Accuracy Tradeoffs , 2012, NIPS.

[31]  Ravishankar Krishnaswamy,et al.  The Hardness of Approximation of Euclidean k-Means , 2015, SoCG.

[32]  Jirí Matousek,et al.  On Approximate Geometric k -Clustering , 2000, Discret. Comput. Geom..

[33]  Stacey Jeffery,et al.  Frameworks for Quantum Algorithms , 2014 .

[34]  Daniel Dadush,et al.  Solving the Closest Vector Problem in 2^n Time -- The Discrete Gaussian Strikes Again! , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[35]  Martin J. Wainwright,et al.  Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[36]  Klaus Sutner Probabilistic Algorithms , 2017 .

[37]  Tobias Friedrich,et al.  Exact and Efficient Generation of Geometric Random Variates and Random Graphs , 2013, ICALP.

[38]  Sariel Har-Peled,et al.  On coresets for k-means and k-median clustering , 2004, STOC '04.

[39]  Jonathan Ullman,et al.  The Price of Selection in Differential Privacy , 2017, COLT.

[40]  Kasturi R. Varadarajan,et al.  Geometric Approximation via Coresets , 2007 .

[41]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[42]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[43]  S. KarthikC.,et al.  Inapproximability of Clustering in Lp Metrics , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[44]  Daniel M. Kane,et al.  Nearly Tight Bounds for Robust Proper Learning of Halfspaces with a Margin , 2019, NeurIPS.

[45]  Aarti Singh,et al.  Differentially private subspace clustering , 2015, NIPS.

[46]  Sudipto Guha,et al.  Rounding via Trees : Deterministic Approximation Algorithms forGroup , 1998 .

[47]  Michiel H. M. Smid Maintaining the minimal distance of a point set in polylogarithmic time , 1991, SODA '91.

[48]  Ola Svensson,et al.  Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[49]  Kobbi Nissim,et al.  Locating a Small Cluster Privately , 2016, PODS.

[50]  Daniele Micciancio,et al.  A Deterministic Single Exponential Time Algorithm for Most Lattice Problems based on Voronoi Cell Computations ( Extended Abstract ) , 2009 .

[51]  Salil P. Vadhan,et al.  The Complexity of Differential Privacy , 2017, Tutorials on the Foundations of Cryptography.

[52]  Janardhan Kulkarni,et al.  Collecting Telemetry Data Privately , 2017, NIPS.

[53]  Sanjoy Dasgupta,et al.  An elementary proof of a theorem of Johnson and Lindenstrauss , 2003, Random Struct. Algorithms.

[54]  Moni Naor,et al.  On the complexity of differentially private data release: efficient algorithms and hardness results , 2009, STOC '09.

[55]  Andris Ambainis,et al.  Quantum walk algorithm for element distinctness , 2003, 45th Annual IEEE Symposium on Foundations of Computer Science.

[56]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[57]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[58]  David M. Mount,et al.  A local search approximation algorithm for k-means clustering , 2002, SCG '02.

[59]  A. Paz Probabilistic algorithms , 2003 .

[60]  Dan Feldman,et al.  Coresets for Differentially Private K-Means Clustering and Applications to Privacy in Mobile Sensor Networks , 2017, 2017 16th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).

[61]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[62]  David A. McAllester Simplified PAC-Bayesian Margin Bounds , 2003, COLT.

[63]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[64]  Sudipto Guha,et al.  A constant-factor approximation algorithm for the k-median problem (extended abstract) , 1999, STOC '99.

[65]  Shai Ben-David,et al.  The Computational Complexity of Densest Region Detection , 2002, J. Comput. Syst. Sci..

[66]  V. V. Shenmaier,et al.  The problem of a minimal ball enclosing k points , 2013 .

[67]  Pravesh Kothari,et al.  25th Annual Conference on Learning Theory Differentially Private Online Learning , 2022 .

[68]  Daniel M. Kane,et al.  The Complexity of Adversarially Robust Proper Learning of Halfspaces with Agnostic Noise , 2020, NeurIPS.

[69]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[70]  Maria-Florina Balcan,et al.  Differentially Private Clustering in High-Dimensional Euclidean Spaces , 2017, ICML.

[71]  Pasin Manurangsi,et al.  On Closest Pair in Euclidean Metric: Monochromatic is as Hard as Bichromatic , 2018, Combinatorica.

[72]  Daniel Kifer,et al.  Private Convex Optimization for Empirical Risk Minimization with Applications to High-dimensional Regression , 2012, COLT.

[73]  Michael Ian Shamos,et al.  Closest-point problems , 1975, 16th Annual Symposium on Foundations of Computer Science (sfcs 1975).

[74]  Frédéric Magniez,et al.  Search via quantum walk , 2006, STOC '07.

[75]  Hans Ulrich Simon,et al.  Efficient Learning of Linear Perceptrons , 2000, NIPS.

[76]  Kobbi Nissim,et al.  Clustering Algorithms for the Centralized and Local Models , 2017, ALT.

[77]  Uri Stemmer Locally Private k-Means Clustering , 2020, SODA.

[78]  Tanja Lange,et al.  Quantum Algorithms for the Subset-Sum Problem , 2013, PQCrypto.

[79]  Scott Aaronson,et al.  On the quantum complexity of closest pair and related problems , 2020, Computational Complexity Conference.

[80]  Amin Saberi,et al.  A new greedy approach for facility location problems , 2002, STOC '02.

[81]  Shi Li,et al.  Approximating k-median via pseudo-approximation , 2012, STOC '13.

[82]  Piotr Indyk,et al.  Approximate clustering via core-sets , 2002, STOC '02.

[83]  Dan Feldman,et al.  A PTAS for k-means clustering based on weak coresets , 2007, SCG '07.

[84]  Mihaela van der Schaar,et al.  Differentially Private Bagging: Improved utility and cheaper privacy than subsample-and-aggregate , 2019, NeurIPS.

[85]  Ke Chen,et al.  On k-Median clustering in high dimensions , 2006, SODA '06.

[86]  R. Schapire,et al.  Toward efficient agnostic learning , 1992, COLT '92.

[87]  Michael Ian Shamos,et al.  Divide-and-conquer in multidimensional space , 1976, STOC '76.

[88]  Shai Ben-David,et al.  On the difficulty of approximately maximizing agreements , 2000, J. Comput. Syst. Sci..

[89]  Michiel H. M. Smid,et al.  Enumerating the k closest pairs optimally , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[90]  Michiel H. M. Smid,et al.  New techniques for exact and approximate dynamic closest-point problems , 1994, SCG '94.

[91]  Albert B Novikoff,et al.  ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .

[92]  Jeffrey S. Salowe Shallow Interdistnace Selection and Interdistance Enumeration , 1991, WADS.

[93]  Yair Bartal,et al.  Probabilistic approximation of metric spaces and its algorithmic applications , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[94]  Jonathan Ullman,et al.  Efficient Private Algorithms for Learning Large-Margin Halfspaces , 2020, ALT.

[95]  John M. Abowd,et al.  The U.S. Census Bureau Adopts Differential Privacy , 2018, KDD.

[96]  Amit Kumar,et al.  Linear Time Algorithms for Clustering Problems in Any Dimensions , 2005, ICALP.

[97]  Divesh Aggarwal,et al.  Just Take the Average! An Embarrassingly Simple $2^n$-Time Algorithm for SVP (and CVP) , 2017, SOSA.

[98]  Haim Kaplan,et al.  Private coresets , 2009, STOC '09.

[99]  Aravind Srinivasan,et al.  An Improved Approximation for k-Median and Positive Correlation in Budgeted Optimization , 2014, SODA.

[100]  Di Wang,et al.  Differentially Private Empirical Risk Minimization Revisited: Faster and More General , 2018, NIPS.

[101]  Guy N. Rothblum,et al.  Boosting and Differential Privacy , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[102]  Sergei Bespamyatnikh,et al.  An Optimal Algorithm for Closest-Pair Maintenance , 1998, Discret. Comput. Geom..

[103]  Raef Bassily,et al.  Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds , 2014, 1405.7085.

[104]  Kamesh Munagala,et al.  Local Search Heuristics for k-Median and Facility Location Problems , 2004, SIAM J. Comput..

[105]  Konstantin Makarychev,et al.  Performance of Johnson-Lindenstrauss transform for k-means and k-medians clustering , 2018, STOC.

[106]  Roksana Boreli,et al.  K-variates++: More Pluses in the K-means++ , 2016, ICML.

[107]  J. Matou On Approximate Geometric K-clustering , 1999 .

[108]  M. D. Kirszbraun Über die zusammenziehende und Lipschitzsche Transformationen , 1934 .