Practical Differential Privacy via Grouping and Smoothing

We address one-time publishing of non-overlapping counts with e-differential privacy. These statistics are useful in a wide and important range of applications, including transactional, traffic and medical data analysis. Prior work on the topic publishes such statistics with prohibitively low utility in several practical scenarios. Towards this end, we present GS, a method that pre-processes the counts by elaborately grouping and smoothing them via averaging. This step acts as a form of preliminary perturbation that diminishes sensitivity, and enables GS to achieve e-differential privacy through low Laplace noise injection. The grouping strategy is dictated by a sampling mechanism, which minimizes the smoothing perturbation. We demonstrate the superiority of GS over its competitors, and confirm its practicality, via extensive experiments on real datasets.

[1]  Ninghui Li,et al.  Provably Private Data Anonymization: Or, k-Anonymity Meets Differential Privacy , 2011, ArXiv.

[2]  Johannes Gehrke,et al.  Differential privacy via wavelet transforms , 2009, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[3]  Rathindra Sarathy,et al.  Evaluating Laplace Noise Addition to Satisfy Differential Privacy for Numeric Data , 2011, Trans. Data Priv..

[4]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[5]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[6]  Ashwin Machanavajjhala,et al.  Publishing Search Logs—A Comparative Study of Privacy Guarantees , 2012, IEEE Transactions on Knowledge and Data Engineering.

[7]  Yin Yang,et al.  Compressive mechanism: utilizing sparse representation in differential privacy , 2011, WPES.

[8]  Yin Yang,et al.  Differentially private histogram publication , 2012, The VLDB Journal.

[9]  Ashwin Machanavajjhala,et al.  Privacy: Theory meets Practice on the Map , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[10]  Kunal Talwar,et al.  On the geometry of differential privacy , 2009, STOC '10.

[11]  Tim Roughgarden,et al.  Interactive privacy via the median mechanism , 2009, STOC '10.

[12]  Dan Suciu,et al.  Boosting the accuracy of differentially private histograms through consistency , 2009, Proc. VLDB Endow..

[13]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[14]  Ninghui Li,et al.  On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy , 2011, ASIACCS '12.

[15]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[16]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[17]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[18]  Divesh Srivastava,et al.  Differentially Private Publication of Sparse Data , 2011, ArXiv.

[19]  Suman Nath,et al.  Differentially private aggregation of distributed time-series with transformation and encryption , 2010, SIGMOD Conference.

[20]  Andrew McGregor,et al.  Optimizing linear counting queries under differential privacy , 2009, PODS.

[21]  Divesh Srivastava,et al.  Differentially private summaries for sparse data , 2012, ICDT '12.

[22]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[23]  Johannes Gehrke,et al.  iReduct: differential privacy with reduced relative errors , 2011, SIGMOD '11.

[24]  Benjamin C. M. Fung,et al.  Anonymizing sequential releases , 2006, KDD '06.

[25]  Chun Yuan,et al.  Differentially Private Data Release through Multidimensional Partitioning , 2010, Secure Data Management.

[26]  Yin Yang,et al.  Low-Rank Mechanism: Optimizing Batch Queries under Differential Privacy , 2012, Proc. VLDB Endow..

[27]  Yufei Tao,et al.  Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.