PrivKV: Key-Value Data Collection with Local Differential Privacy

Local differential privacy (LDP), where each user perturbs her data locally before sending to an untrusted data collector, is a new and promising technique for privacy-preserving distributed data collection. The advantage of LDP is to enable the collector to obtain accurate statistical estimation on sensitive user data (e.g., location and app usage) without accessing them. However, existing work on LDP is limited to simple data types, such as categorical, numerical, and set-valued data. To the best of our knowledge, there is no existing LDP work on key-value data, which is an extremely popular NoSQL data model and the generalized form of set-valued and numerical data. In this paper, we study this problem of frequency and mean estimation on key-value data by first designing a baseline approach PrivKV within the same "perturbation-calibration" paradigm as existing LDP techniques. To address the poor estimation accuracy due to the clueless perturbation of users, we then propose two iterative solutions PrivKVM and PrivKVM+ that can gradually improve the estimation results through a series of iterations. An optimization strategy is also presented to reduce network latency and increase estimation accuracy by introducing virtual iterations in the collector side without user involvement. We verify the correctness and effectiveness of these solutions through theoretical analysis and extensive experimental results.

[1]  Úlfar Erlingsson,et al.  Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries , 2015, Proc. Priv. Enhancing Technol..

[2]  Uri Stemmer,et al.  Heavy Hitters and the Structure of Local Privacy , 2017, PODS.

[3]  Yin Yang,et al.  Generating Synthetic Decentralized Social Graphs with Local Differential Privacy , 2017, CCS.

[4]  K. Pearson VII. Note on regression and inheritance in the case of two parents , 1895, Proceedings of the Royal Society of London.

[5]  Peter Kairouz,et al.  Discrete Distribution Estimation under Local Privacy , 2016, ICML.

[6]  Yin Yang,et al.  Heavy Hitter Estimation over Set-Valued Data with Local Differential Privacy , 2016, CCS.

[7]  Raef Bassily,et al.  Practical Locally Private Heavy Hitters , 2017, NIPS.

[8]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[9]  Úlfar Erlingsson,et al.  Prochlo: Strong Privacy for Analytics in the Crowd , 2017, SOSP.

[10]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[11]  Divesh Srivastava,et al.  Marginal Release Under Local Differential Privacy , 2017, SIGMOD Conference.

[12]  Janardhan Kulkarni,et al.  Collecting Telemetry Data Privately , 2017, NIPS.

[13]  Jong Wook Kim,et al.  Application of Local Differential Privacy to Collection of Indoor Positioning Data , 2018, IEEE Access.

[14]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[15]  Raef Bassily,et al.  Local, Private, Efficient Protocols for Succinct Histograms , 2015, STOC.

[16]  Hongxia Jin,et al.  Private spatial data aggregation in the local setting , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[17]  Ninghui Li,et al.  Locally Differentially Private Frequent Itemset Mining , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[18]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[19]  Martin J. Wainwright,et al.  Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[20]  Ninghui Li,et al.  PrivBasis: Frequent Itemset Mining with Differential Privacy , 2012, Proc. VLDB Endow..

[21]  Ninghui Li,et al.  Differential Privacy: From Theory to Practice , 2016, Differential Privacy.

[22]  Pramod Viswanath,et al.  Extremal Mechanisms for Local Differential Privacy , 2014, J. Mach. Learn. Res..

[23]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[24]  Dan Boneh,et al.  Prio: Private, Robust, and Scalable Computation of Aggregate Statistics , 2017, NSDI.

[25]  Tanzima Hashem,et al.  Computing Aggregates Over Numeric Data with Personalized Local Differential Privacy , 2017, ACISP.

[26]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[27]  Ninghui Li,et al.  Privacy at Scale: Local Dierential Privacy in Practice , 2018 .

[28]  Yin Yang,et al.  Collecting and Analyzing Data from Smart Device Users with Local Differential Privacy , 2016, ArXiv.

[29]  Martin J. Wainwright,et al.  Privacy Aware Learning , 2012, JACM.

[30]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[31]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[32]  Martin J. Wainwright,et al.  Local Privacy, Data Processing Inequalities, and Statistical Minimax Rates , 2013, 1302.3203.

[33]  Ninghui Li,et al.  Locally Differentially Private Protocols for Frequency Estimation , 2017, USENIX Security Symposium.

[34]  Akihiko Ohsuga,et al.  Differential Private Data Collection and Analysis Based on Randomized Multiple Dummies for Untrusted Mobile Crowdsensing , 2017, IEEE Transactions on Information Forensics and Security.

[35]  Philip S. Yu,et al.  $\textsf{LoPub}$ : High-Dimensional Crowdsourced Data Publication With Local Differential Privacy , 2016, IEEE Transactions on Information Forensics and Security.

[36]  Liusheng Huang,et al.  Private Weighted Histogram Aggregation in Crowdsourcing , 2016, WASA.

[37]  Adam D. Smith,et al.  Is Interaction Necessary for Distributed Private Learning? , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[38]  Benjamin Livshits,et al.  BLENDER: Enabling Local Search with a Hybrid Differential Privacy Model , 2017, USENIX Security Symposium.

[39]  Xintao Wu,et al.  Using Randomized Response for Differential Privacy Preserving Data Collection , 2016, EDBT/ICDT Workshops.