PCKV: Locally Differentially Private Correlated Key-Value Data Collection with Optimized Utility

Data collection under local differential privacy (LDP) has been mostly studied for homogeneous data. Real-world applications often involve a mixture of different data types such as key-value pairs, where the frequency of keys and mean of values under each key must be estimated simultaneously. For key-value data collection with LDP, it is challenging to achieve a good utility-privacy tradeoff since the data contains two dimensions and a user may possess multiple key-value pairs. There is also an inherent correlation between key and values which if not harnessed, will lead to poor utility. In this paper, we propose a locally differentially private key-value data collection framework that utilizes correlated perturbations to enhance utility. We instantiate our framework by two protocols PCKV-UE (based on Unary Encoding) and PCKV-GRR (based on Generalized Randomized Response), where we design an advanced Padding-and-Sampling mechanism and an improved mean estimator which is non-interactive. Due to our correlated key and value perturbation mechanisms, the composed privacy budget is shown to be less than that of independent perturbation of key and value, which enables us to further optimize the perturbation parameters via budget allocation. Experimental results on both synthetic and real-world datasets show that our proposed protocols achieve better utility for both frequency and mean estimations under the same LDP guarantees than state-of-the-art mechanisms.

[1]  Philip S. Yu,et al.  $\textsf{LoPub}$ : High-Dimensional Crowdsourced Data Publication With Local Differential Privacy , 2016, IEEE Transactions on Information Forensics and Security.

[2]  Martin J. Wainwright,et al.  Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[3]  Martin J. Wainwright,et al.  Minimax Optimal Procedures for Locally Private Estimation , 2016, ArXiv.

[4]  Yin Yang,et al.  Heavy Hitter Estimation over Set-Valued Data with Local Differential Privacy , 2016, CCS.

[5]  Jiming Chen,et al.  CALM: Consistent Adaptive Local Marginal for Marginal Release under Local Differential Privacy , 2018, CCS.

[6]  Teng Wang,et al.  Conditional Analysis for Key-Value Data with Local Differential Privacy , 2019, ArXiv.

[7]  Ge Yu,et al.  Collecting and Analyzing Multidimensional Data with Local Differential Privacy , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[8]  Janardhan Kulkarni,et al.  Collecting Telemetry Data Privately , 2017, NIPS.

[9]  Yin Yang,et al.  Generating Synthetic Decentralized Social Graphs with Local Differential Privacy , 2017, CCS.

[10]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[11]  Raef Bassily,et al.  Local, Private, Efficient Protocols for Succinct Histograms , 2015, STOC.

[12]  Xiaofeng Meng,et al.  PrivKV: Key-Value Data Collection with Local Differential Privacy , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[13]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[14]  Ninghui Li,et al.  Locally Differentially Private Protocols for Frequency Estimation , 2017, USENIX Security Symposium.

[15]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[16]  Hongxia Jin,et al.  Private spatial data aggregation in the local setting , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[17]  Ninghui Li,et al.  Locally Differentially Private Frequent Itemset Mining , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[18]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[19]  Robert L. Wolpert,et al.  Statistical Inference , 2019, Encyclopedia of Social Network Analysis and Mining.

[20]  Yin Yang,et al.  Collecting and Analyzing Data from Smart Device Users with Local Differential Privacy , 2016, ArXiv.

[21]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.