Differentially private query learning: From data publishing to model publishing

As one of the most influential privacy definitions, differential privacy provides a rigorous and provable privacy guarantee for data publishing. However, the curator has to release a large number of queries in a batch or a synthetic dataset in the Big Data era. Two challenges need to be tackled: one is how to decrease the correlation between large sets of queries, while the other is how to predict on fresh queries. This paper transfers the data publishing problem to a machine learning problem, in which queries are considered as training samples and a prediction model will be released rather than query results or synthetic datasets. When the model is published, it can be used to answer current submitted queries and predict results for fresh queries from the public. Compared with the traditional method, the proposed prediction model enhances the accuracy of query results for non-interactive publishing. We prove that learning model can successfully retain the utility of published queries while preserving privacy.

[1]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[2]  Philip S. Yu,et al.  Differentially Private Data Publishing and Analysis: A Survey , 2017, IEEE Transactions on Knowledge and Data Engineering.

[3]  Yin Yang,et al.  26 F eb 2 01 5 A Optimizing Batch Linear Queries under Exact and Approximate Differential Privacy , 2018 .

[4]  Philip S. Yu,et al.  Orthogonal mechanism for answering batch queries with differential privacy , 2015, SSDBM.

[5]  Gerome Miklau,et al.  Optimal error of query sets under the differentially-private matrix mechanism , 2012, ICDT '13.

[6]  Toniann Pitassi,et al.  Preserving Statistical Validity in Adaptive Data Analysis , 2014, STOC.

[7]  Yu Zhang,et al.  Differentially Private High-Dimensional Data Publication via Sampling-Based Inference , 2015, KDD.

[8]  Aaron Roth,et al.  A learning theory approach to noninteractive database privacy , 2011, JACM.

[9]  Philip S. Yu,et al.  Differentially private data release for data mining , 2011, KDD.

[10]  Guy N. Rothblum,et al.  A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[11]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[12]  Dan Suciu,et al.  Boosting the accuracy of differentially private histograms through consistency , 2009, Proc. VLDB Endow..

[13]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[14]  Cynthia Dwork,et al.  Differential privacy in new settings , 2010, SODA '10.