Differentially private multi-party high-dimensional data publishing

In this paper, we study the novel problem of publishing high-dimensional data in a distributed multi-party environment under differential privacy. In particular, with the assistance of a semi-trusted curator, the involved parties (i.e., local data owners) collectively generate a synthetic integrated dataset while satisfying ε-differential privacy for any local dataset. To solve this problem, we present a differentially private sequential update of Bayesian network (DP-SUBN) solution. In DP-SUBN, the parties and the curator collaboratively identify the Bayesian network ℕ that best fits the integrated dataset D in a sequential manner, from which a synthetic dataset can then be generated. The fundamental advantage of adopting the sequential update manner is that the parties can treat the statistical results provided by previous parties as their prior knowledge to direct how to learn ℕ. The core of DP-SUBN is the construction of the search frontier, which can be seen as a priori knowledge to guide the parties to update ℕ. To improve the fitness of ℕ and reduce the communication cost, we introduce a correlation-aware search frontier construction (CSFC) approach, where attribute pairs with strong correlations are used to construct the search frontier. In particular, to privately quantify the correlations of attribute pairs without introducing too much noise, we first propose a non-overlapping covering design (NOCD) method, and then introduce a dynamic programming method to find the optimal parameters used in NOCD to ensure that the injected noise is minimum. Through formal privacy analysis, we show that DP-SUBN satisfies ε-differential privacy for any local dataset. Extensive experiments on a real dataset demonstrate that DP-SUBN offers desirable data utility with low communication cost.

[1]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[2]  Kenji Yamanishi Distributed Cooperative Bayesian Learning Strategies , 1999, Inf. Comput..

[3]  Elaine Shi,et al.  Differentially Private Continual Monitoring of Heavy Hitters from Distributed Streams , 2012, Privacy Enhancing Technologies.

[4]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[5]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[6]  Johannes Gehrke,et al.  Differential privacy via wavelet transforms , 2009, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[7]  JiangWei,et al.  A secure distributed framework for achieving k-anonymity , 2006, VLDB 2006.

[8]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[9]  Assaf Schuster,et al.  Privacy-Preserving Distributed Stream Monitoring , 2014, NDSS.

[10]  Marianne Winslett,et al.  Differentially private data cubes: optimizing noise sources and consistency , 2011, SIGMOD '11.

[11]  Jun Zhang,et al.  PrivBayes: private data release via bayesian networks , 2014, SIGMOD Conference.

[12]  Yu Zhang,et al.  Differentially Private High-Dimensional Data Publication via Sampling-Based Inference , 2015, KDD.

[13]  Benjamin C. M. Fung,et al.  Secure Distributed Framework for Achieving ε-Differential Privacy , 2012, Privacy Enhancing Technologies.

[14]  Divesh Srivastava,et al.  Accurate and efficient private release of datacubes and contingency tables , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[15]  Bhiksha Raj,et al.  Multiparty Differential Privacy via Aggregation of Locally Trained Classifiers , 2010, NIPS.

[16]  Benjamin C. M. Fung,et al.  Anonymity meets game theory: secure data integration with malicious participants , 2011, The VLDB Journal.

[17]  Sheng Zhong,et al.  Privacy-enhancing k-anonymization of customer data , 2005, PODS.

[18]  Benjamin C. M. Fung,et al.  m-Privacy for collaborative data publishing , 2011, 7th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom).

[19]  Daniel A. Spielman,et al.  Spectral Graph Theory and its Applications , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[20]  Claude Castelluccia,et al.  I Have a DREAM! (DiffeRentially privatE smArt Metering) , 2011, Information Hiding.

[21]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[22]  C K HungPatrick,et al.  Centralized and Distributed Anonymization for High-Dimensional Healthcare Data , 2010 .

[23]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[24]  Chris Clifton,et al.  A secure distributed framework for achieving k-anonymity , 2006, The VLDB Journal.

[25]  Paul Francis,et al.  Towards Statistical Queries over Distributed Private User Data , 2012, NSDI.

[26]  Avi Wigderson,et al.  Completeness theorems for non-cryptographic fault-tolerant distributed computation , 1988, STOC '88.

[27]  Ninghui Li,et al.  Differentially Private Publishing of High-dimensional Data Using Sensitivity Control , 2015, AsiaCCS.

[28]  Suman Nath,et al.  Privacy-aware personalization for mobile advertising , 2012, CCS.

[29]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[30]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[31]  Ninghui Li,et al.  PriView: practical differentially private release of marginal contingency tables , 2014, SIGMOD Conference.

[32]  Li Xiong,et al.  A Comprehensive Comparison of Multiparty Secure Additions with Differential Privacy , 2017, IEEE Transactions on Dependable and Secure Computing.

[33]  Sanjay Goel,et al.  Collaborative Search Log Sanitization: Toward Differential Privacy and Boosted Utility , 2015, IEEE Transactions on Dependable and Secure Computing.

[34]  Cynthia Dwork,et al.  Privacy, accuracy, and consistency too: a holistic solution to contingency table release , 2007, PODS.