Privacy-aware Synthesizing for Crowdsourced Data

Although releasing crowdsourced data brings many benefits to the data analyzers to conduct statistical analysis, it may violate crowd users’ data privacy. A potential way to address this problem is to employ traditional differential privacy (DP) mechanisms and perturb the data with some noise before releasing them. However, considering that there usually exist conflicts among the crowdsourced data and these data are usually large in volume, directly using these mechanisms can not guarantee good utility in the setting of releasing crowdsourced data. To address this challenge, in this paper, we propose a novel privacy-aware synthesizing method (i.e., PrisCrowd) for crowdsourced data, based on which the data collector can release users’ data with strong privacy protection for their private information, while at the same time, the data analyzer can achieve good utility from the released data. Both theoretical analysis and extensive experiments on real-world datasets demonstrate the desired performance of the proposed method.

[1]  Bo Zhao,et al.  A Confidence-Aware Approach for Truth Discovery on Long-Tail Data , 2014, Proc. VLDB Endow..

[2]  Ran Canetti,et al.  Advances in Cryptology – CRYPTO 2012 , 2012, Lecture Notes in Computer Science.

[3]  Bo Zhao,et al.  Conflicts to Harmony: A Framework for Resolving Conflicts in Heterogeneous Data by Truth Discovery , 2016, IEEE Transactions on Knowledge and Data Engineering.

[4]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[5]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2016, J. Priv. Confidentiality.

[6]  Dan Roth,et al.  Knowing What to Believe (when you already know something) , 2010, COLING.

[7]  Bo Zhao,et al.  From Truth Discovery to Trustworthy Opinion Discovery: An Uncertainty-Aware Quantitative Modeling Approach , 2016, KDD.

[8]  Andrew McGregor,et al.  Optimizing linear counting queries under differential privacy , 2009, PODS.

[9]  Carl A. Gunter,et al.  Plausible Deniability for Privacy-Preserving Data Synthesis , 2017, Proc. VLDB Endow..

[10]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[11]  Chenglin Miao,et al.  A lightweight privacy-preserving truth discovery framework for mobile crowd sensing systems , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[12]  Guy N. Rothblum,et al.  A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[13]  Hengrun Zhang,et al.  A Survey on Security, Privacy, and Trust in Mobile Crowdsourcing , 2018, IEEE Internet of Things Journal.

[14]  Chenglin Miao,et al.  Cloud-Enabled Privacy-Preserving Truth Discovery in Crowd Sensing Systems , 2015, SenSys.

[15]  Tim Roughgarden,et al.  Interactive privacy via the median mechanism , 2009, STOC '10.

[16]  Chenglin Miao,et al.  Privacy-Preserving Truth Discovery in Crowd Sensing Systems , 2019, ACM Trans. Sens. Networks.

[17]  Bo Zhao,et al.  Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation , 2014, SIGMOD Conference.

[18]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[19]  Algorithmic Learning Theory , 1994, Lecture Notes in Computer Science.

[20]  Xintao Wu,et al.  An overview of human genetic privacy , 2017, Annals of the New York Academy of Sciences.

[21]  Jin H. Im,et al.  Privacy , 2002, Encyclopedia of Information Systems.

[22]  Divesh Srivastava,et al.  Truth Finding on the Deep Web: Is the Problem Solved? , 2012, Proc. VLDB Endow..

[23]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[24]  Nesime Tatbul,et al.  Proceedings of the VLDB Endowment , 2011 .