Differentially Private H-Tree

In this paper, we study the problem of publishing a synopsis of two-dimensional datasets using differential privacy. The challenge is to enable accurate answers range count queries given a privacy budget. The state-of-the-art methods either construct a hierarchy of the partitions, or lay a one or two-level equi-width grid over the data domain, which are not suitable for high dimension and skewed datasets, respectively. To overcome such issues, we propose a technique that takes advantage of a two-level tree and a data-dependent method, namely private h-tree. As the height of the tree is kept low, h-tree requires less budget for node counts and thus more budget can be used for median splits. As splitting points of h-tree must be selected privately, we propose a recursive budget strategy to minimize noise added to the queries by reducing the number of median splits from linear to logarithmic. As a data-dependent approach, private h-tree provides accurate answers for range count queries under skewed data distribution. Experimental results on both real-world and synthetic datasets compare the accuracy of our proposed solution with the state-of-the-art methods, showing the superiority of our approach, particularly with skewed datasets in the presence of outliers.

[1]  F. Frances Yao,et al.  Computational Geometry , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[2]  Panos Kalnis,et al.  Private queries in location based services: anonymizers are not necessary , 2008, SIGMOD Conference.

[3]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[4]  Marco Gruteser,et al.  USENIX Association , 1992 .

[5]  Ilya Mironov,et al.  Differentially private recommender systems: building privacy into the net , 2009, KDD.

[6]  Hanan Samet,et al.  Foundations of multidimensional and metric data structures , 2006, Morgan Kaufmann series in data management systems.

[7]  Vaidy S. Sunderam,et al.  Differentially Private Multi-dimensional Time Series Release for Traffic Monitoring , 2013, DBSec.

[8]  Chun Yuan,et al.  Differentially Private Data Release through Multidimensional Partitioning , 2010, Secure Data Management.

[9]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[10]  Cyrus Shahabi,et al.  A Server-Assigned Spatial Crowdsourcing Framework , 2015, ACM Trans. Spatial Algorithms Syst..

[11]  Cyrus Shahabi,et al.  PrivGeoCrowd: A toolbox for studying private spatial Crowdsourcing , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[12]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[13]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[14]  Gregory Piatetsky-Shapiro,et al.  Accurate estimation of the number of tuples satisfying a condition , 1984, SIGMOD '84.

[15]  Ninghui Li,et al.  Understanding Hierarchical Methods for Differentially Private Histograms , 2013, Proc. VLDB Endow..

[16]  Albert-László Barabási,et al.  Understanding individual human mobility patterns , 2008, Nature.

[17]  Ninghui Li,et al.  Differentially private grids for geospatial data , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[18]  Dan Suciu,et al.  Boosting the accuracy of differentially private histograms through consistency , 2009, Proc. VLDB Endow..

[19]  Xing Xie,et al.  Collaborative location and activity recommendations with GPS history data , 2010, WWW '10.

[20]  Cyrus Shahabi,et al.  A Framework for Protecting Worker Location Privacy in Spatial Crowdsourcing , 2014, Proc. VLDB Endow..

[21]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[22]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[23]  Cyrus Shahabi,et al.  Entropy-based histograms for selectivity estimation , 2013, CIKM.

[24]  Divesh Srivastava,et al.  Differentially Private Spatial Decompositions , 2011, 2012 IEEE 28th International Conference on Data Engineering.

[25]  Walid G. Aref,et al.  Casper*: Query processing for location services without compromising privacy , 2006, TODS.