The Trade-Off Between Privacy and Fidelity via Ehrhart Theory

As an increasing amount of data is gathered nowadays and stored in databases, the question arises of how to protect the privacy of individual records in a database even while providing accurate answers to queries on the database. Differential Privacy (DP) has gained acceptance as a framework to quantify vulnerability of algorithms to privacy breaches. We consider the problem of how to sanitize an entire database via a DP mechanism, on which unlimited further querying is performed. While protecting privacy, it is important that the sanitized database still provide accurate responses to queries. The central contribution of this work is to characterize the amount of information preserved in an optimal DP database sanitizing mechanism (DSM). We precisely characterize the utility-privacy trade-off of mechanisms that sanitize databases in the asymptotic regime of large databases. We study this in an information-theoretic framework by modeling a generic distribution on the data, and a measure of fidelity between the histograms of the original and sanitized databases. We consider the popular $\mathbb {L}_{1}-$ distortion metric, i.e., the total variation norm that leads to the formulation as a linear program (LP). This optimization problem is prohibitive in complexity with the number of constraints growing exponentially in the parameters of the problem. Our focus on the asymptotic regime enables us characterize precisely, the limit of the sequence of solutions to this optimization problem. Leveraging tools from discrete geometry, analytic combinatorics, and duality theorems of optimization, we fully characterize this limit in terms of a power series whose coefficients are the number of integer points on a multidimensional convex cross-polytope studied by Ehrhart in 1967. Employing Ehrhart theory, we determine a simple closed form computable expression for the asymptotic growth of the optimal privacy-fidelity trade-off to infinite precision. At the heart of the findings is a deep connection between the minimum expected distortion and a fundamental construct in Ehrhart theory - Ehrhart series of an integral convex polytope.

[1]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[2]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[3]  G. Pólya,et al.  Problems and theorems in analysis , 1983 .

[4]  Pramod Viswanath,et al.  The Composition Theorem for Differential Privacy , 2013, IEEE Transactions on Information Theory.

[5]  Pramod Viswanath,et al.  The Staircase Mechanism in Differential Privacy , 2015, IEEE Journal of Selected Topics in Signal Processing.

[6]  Wojciech Szpankowski,et al.  Minimum Expected Length of Fixed-to-Variable Lossless Compression Without Prefix Constraints , 2011, IEEE Transactions on Information Theory.

[7]  Pramod Viswanath,et al.  Optimal Noise Adding Mechanisms for Approximate Differential Privacy , 2016, IEEE Transactions on Information Theory.

[8]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[9]  Neil J. A. Sloane,et al.  Low–dimensional lattices. VII. Coordination sequences , 1997, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[10]  Pramod Viswanath,et al.  The Optimal Noise-Adding Mechanism in Differential Privacy , 2012, IEEE Transactions on Information Theory.

[11]  O. F. Cook The Method of Types , 1898 .

[12]  Kobbi Nissim,et al.  Impossibility of Differentially Private Universally Optimal Mechanisms , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[13]  Pramod Viswanath,et al.  Extremal Mechanisms for Local Differential Privacy , 2014, J. Mach. Learn. Res..

[14]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[15]  Wojciech Szpankowski,et al.  Types of Markov Fields and Tilings , 2016, IEEE Transactions on Information Theory.

[16]  John N. Tsitsiklis,et al.  Introduction to linear optimization , 1997, Athena scientific optimization and computation series.

[17]  S. Robins,et al.  Computing the Continuous Discretely: Integer-Point Enumeration in Polyhedra , 2007 .

[18]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[19]  Kunal Talwar,et al.  On the geometry of differential privacy , 2009, STOC '10.

[20]  Lei Ying,et al.  On the relation between identifiability, differential privacy, and mutual-information privacy , 2014, 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[21]  L Sweeney,et al.  Weaving Technology and Policy Together to Maintain Confidentiality , 1997, Journal of Law, Medicine & Ethics.