Differentially private anonymized histograms

For a dataset of label-count pairs, an anonymized histogram is the multiset of counts. Anonymized histograms appear in various potentially sensitive contexts such as password-frequency lists, degree distribution in social networks, and estimation of symmetric properties of discrete distributions. Motivated by these applications, we propose the first differentially private mechanism to release anonymized histograms that achieves near-optimal privacy utility trade-off both in terms of number of items and the privacy parameter. Further, if the underlying histogram is given in a compact format, the proposed algorithm runs in time sub-linear in the number of items. For anonymized histograms generated from unknown discrete distributions, we show that the released histogram can be directly used for estimating symmetric properties of the underlying distribution.

[1]  Alon Orlitsky,et al.  A Unified Maximum Likelihood Approach for Estimating Symmetric Properties of Discrete Distributions , 2017, ICML.

[2]  Bonneau Joseph,et al.  Yahoo Password Frequency Corpus , 2015 .

[3]  Himanshu Tyagi,et al.  The Complexity of Estimating Rényi Entropy , 2015, SODA.

[4]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[5]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[6]  Alon Orlitsky,et al.  Data Amplification: Instance-Optimal Property Estimation , 2019, ICML.

[7]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[8]  Donald F. Towsley,et al.  Resisting structural re-identification in anonymized social networks , 2010, The VLDB Journal.

[9]  Aleksandra B. Slavkovic,et al.  Differentially Private Graphical Degree Sequences and Synthetic Graphs , 2012, Privacy in Statistical Databases.

[10]  Martin J. Wainwright,et al.  Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[11]  Sofya Raskhodnikova,et al.  Efficient Lipschitz Extensions for High-Dimensional Graph Statistics and Node Private Degree Distributions , 2015, ArXiv.

[12]  Gregory Valiant,et al.  Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs , 2011, STOC '11.

[13]  Himanshu Tyagi,et al.  Estimating Renyi Entropy of Discrete Distributions , 2014, IEEE Transactions on Information Theory.

[14]  Ninghui Li,et al.  Publishing Graph Degree Distribution with Node Differential Privacy , 2016, SIGMOD Conference.

[15]  David D. Jensen,et al.  Accurate Estimation of the Degree Distribution of Private Networks , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[16]  Huanyu Zhang,et al.  INSPECTRE: Privately Estimating the Unseen , 2018, ICML.

[17]  Hans Ulrich Simon,et al.  A lower bound on the release of differentially private integer partitions , 2018, Inf. Process. Lett..

[18]  Ronitt Rubinfeld,et al.  Testing that distributions are close , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[19]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[20]  H. D. Brunk,et al.  Statistical inference under order restrictions : the theory and application of isotonic regression , 1973 .

[21]  Vitaly Shmatikov,et al.  De-anonymizing Social Networks , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[22]  Alon Orlitsky,et al.  Data Amplification: A Unified and Competitive Approach to Property Estimation , 2019, NeurIPS.

[23]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[24]  Dan Suciu,et al.  Boosting the accuracy of differentially private histograms through consistency , 2009, Proc. VLDB Endow..

[25]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[26]  Sofya Raskhodnikova,et al.  Analyzing Graphs with Node Differential Privacy , 2013, TCC.

[27]  G. Hardy,et al.  Asymptotic Formulaæ in Combinatory Analysis , 1918 .

[28]  Huanyu Zhang,et al.  Hadamard Response: Estimating Distributions Privately, Efficiently, and with Little Communication , 2018, AISTATS.

[29]  James Zou,et al.  Quantifying the unobserved protein-coding variants in human populations provides a roadmap for large-scale sequencing projects , 2015, bioRxiv.

[30]  A. Suresh,et al.  Optimal prediction of the number of unseen species , 2016, Proceedings of the National Academy of Sciences.

[31]  Joseph Bonneau,et al.  Differentially Private Password Frequency Lists , 2016, NDSS.

[32]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[33]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[34]  Tim Roughgarden,et al.  Universally utility-maximizing privacy mechanisms , 2008, STOC '09.

[35]  Samson Zhou,et al.  On the Economics of Offline Password Cracking , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[36]  Gregory Valiant,et al.  The Power of Linear Estimators , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[37]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[38]  J. Leeuw,et al.  Isotone Optimization in R: Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methods , 2009 .

[39]  Jeremiah Blocki Microsoft Differentially Private Integer Partitions and their Applications , 2016 .

[40]  Joseph Bonneau,et al.  The Science of Guessing: Analyzing an Anonymized Corpus of 70 Million Passwords , 2012, 2012 IEEE Symposium on Security and Privacy.

[41]  Alon Orlitsky,et al.  The Broad Optimality of Profile Maximum Likelihood , 2019, NeurIPS.

[42]  Yihong Wu,et al.  Minimax Rates of Entropy Estimation on Large Alphabets via Best Polynomial Approximation , 2014, IEEE Transactions on Information Theory.

[43]  Pramod Viswanath,et al.  Extremal Mechanisms for Local Differential Privacy , 2014, J. Mach. Learn. Res..

[44]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[45]  Cynthia Dwork,et al.  Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography , 2007, WWW '07.

[46]  Yanjun Han,et al.  Minimax Estimation of Functionals of Discrete Distributions , 2014, IEEE Transactions on Information Theory.

[47]  Alon Orlitsky,et al.  On Modeling Profiles Instead of Values , 2004, UAI.