Combining Public and Private Data

Differential privacy is widely adopted to provide provable privacy guarantees in data analysis. We consider the problem of combining public and private data (and, more generally, data with heterogeneous privacy needs) for estimating aggregate statistics. We introduce a mixed estimator of the mean optimized to minimize the variance. We argue that our mechanism is preferable to techniques that preserve the privacy of individuals by subsampling data proportionally to the privacy needs of users. Similarly, we present a mixed median estimator based on the exponential mechanism. We compare our mechanisms to the methods proposed in Jorgensen et al. [2015]. Our experiments provide empirical evidence that our mechanisms often outperform the baseline methods.

[1]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[2]  Anne-Marie Kermarrec,et al.  Heterogeneous Differential Privacy , 2015, J. Priv. Confidentiality.

[3]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[4]  Adam D. Smith,et al.  Privacy-preserving statistical estimation with optimal convergence rates , 2011, STOC '11.

[5]  Ting Yu,et al.  Conservative or liberal? Personalized differential privacy , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[6]  Donald B. Rubin,et al.  THE VARIANCE OF A LINEAR COMBINATION OF INDEPENDENT ESTIMATORS USING ESTIMATED WEIGHTS , 1974 .