Using Randomized Response for Differential Privacy Preserving Data Collection

This paper studies how to enforce differential privacy by using the randomized response in the data collection scenario. Given a client’s value, the randomized algorithm executed by the client reports to the untrusted server a perturbed value. The use of randomized response in surveys enables easy estimations of accurate population statistics while preserving the privacy of the individual respondents. We compare the randomized response with the standard Laplace mechanism which is based on query-output independent adding of Laplace noise. Our research starts from the simple case with one single binary attribute and extends to the general case with multiple polychotomous attributes. We measure utility preservation in terms of the mean squared error of the estimate for various calculations including individual value estimate, proportion estimate, and various derived statistics. We theoretically derive the explicit formula of the mean squared error of various derived statistics based on the randomized response theory and prove the randomized response outperforms the Laplace mechanism. We evaluate our algorithms on YesiWell database including sensitive biomarker data and social network relationships of patients. Empirical evaluation results show effectiveness of our proposed techniques. Especially the use of the randomized response for collecting data incurs fewer utility loss than the output perturbation when the sensitivity of functions is high.

[1]  Xintao Wu,et al.  Privacy Preserving Market Basket Data Analysis , 2007, PKDD.

[2]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[3]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[4]  M. Kendall,et al.  The advanced theory of statistics , 1945 .

[5]  Andrew McGregor,et al.  Optimizing linear counting queries under differential privacy , 2009, PODS.

[6]  Dan Suciu,et al.  Relationship privacy: output perturbation for queries with joins , 2009, PODS.

[7]  A. Chaudhuri,et al.  Randomized Response: Theory and Techniques , 1987 .

[8]  Martin J. Wainwright,et al.  Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[9]  Chris Clifton,et al.  Differential identifiability , 2012, KDD.

[10]  Xintao Wu,et al.  Privacy Preserving Categorical Data Analysis with Unknown Distortion Parameters , 2009, Trans. Data Priv..

[11]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[12]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[13]  Ilya Mironov,et al.  Differentially private recommender systems , 2009 .

[14]  Maurice G. Kendall,et al.  The advanced theory of statistics , 1945 .

[15]  Kamalika Chaudhuri,et al.  Privacy-preserving logistic regression , 2008, NIPS.

[16]  Wenliang Du,et al.  OptRR: Optimizing Randomized Response Schemes for Privacy-Preserving Data Mining , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[17]  Ashwin Machanavajjhala,et al.  No free lunch in data privacy , 2011, SIGMOD '11.

[18]  DworkCynthia A firm foundation for private data analysis , 2011 .

[19]  Jayant R. Haritsa,et al.  A Framework for High-Accuracy Privacy-Preserving Mining , 2005, ICDE.

[20]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[21]  Leting Wu,et al.  Differential Privacy Preserving Spectral Graph Analysis , 2013, PAKDD.

[22]  Pramod Viswanath,et al.  Extremal Mechanisms for Local Differential Privacy , 2014, J. Mach. Learn. Res..

[23]  Yang Xiang,et al.  On Learning Cluster Coefficient of Private Networks , 2012, ASONAM.

[24]  Cynthia Dwork,et al.  Privacy, accuracy, and consistency too: a holistic solution to contingency table release , 2007, PODS.

[25]  Xiaowei Ying,et al.  On Linear Refinement of Differential Privacy-Preserving Query Answering , 2013, PAKDD.

[26]  Yin Yang,et al.  Functional Mechanism: Regression Analysis under Differential Privacy , 2012, Proc. VLDB Endow..

[27]  David D. Jensen,et al.  Accurate Estimation of the Degree Distribution of Private Networks , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[28]  Marianne Winslett,et al.  Differentially private data cubes: optimizing noise sources and consistency , 2011, SIGMOD '11.

[29]  Dan Suciu,et al.  Boosting the accuracy of differentially private histograms through consistency , 2009, Proc. VLDB Endow..

[30]  Kunal Talwar,et al.  On the geometry of differential privacy , 2009, STOC '10.

[31]  Jayant R. Haritsa,et al.  Maintaining Data Privacy in Association Rule Mining , 2002, VLDB.