Latent Feature Disclosure under Perfect Sample Privacy

Guaranteeing perfect data privacy seems to be incompatible with the economical and scientific opportunities provided by extensive data collection and processing. This paper tackles this challenge by studying how to disclose latent features of data sets without compromising the privacy of individual data samples. We leverage counter-intuitive properties of the multivariate statistics of data samples, and propose a technique to disclose collective properties of data sets while keeping each data sample confidential. For a given statistical description of the data set, we show how to build an optimal disclosure strategy/mapping using linear programming techniques. We provide necessary and sufficient conditions that determine when our approach is feasible, and illustrate the optimal solution in some simple scenarios. We observe that the disclosure strategy may be independent of the latent feature in some scenarios, for which explicit expressions for the performance are provided.

[1]  L. Shepp Probability Essentials , 2002 .

[2]  Yuval Rabani,et al.  Linear Programming , 2007, Handbook of Approximation Algorithms and Metaheuristics.

[3]  Deniz Gündüz,et al.  Optimal Utility-Privacy Trade-Off With Total Variation Distance as a Privacy Measure , 2018, IEEE Transactions on Information Forensics and Security.

[4]  Ye Wang,et al.  On privacy-utility tradeoffs for constrained data release mechanisms , 2016, 2016 Information Theory and Applications Workshop (ITA).

[5]  Marian Verhelst,et al.  Understanding Interdependency Through Complex Information Sharing , 2015, Entropy.

[6]  Muriel Médard,et al.  Fundamental limits of perfect privacy , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[7]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[8]  H. Vincent Poor,et al.  Utility-Privacy Tradeoffs in Databases: An Information-Theoretic Approach , 2011, IEEE Transactions on Information Forensics and Security.

[9]  Flávio du Pin Calmon,et al.  Privacy against statistical inference , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[10]  Muriel Médard,et al.  From the Information Bottleneck to the Privacy Funnel , 2014, 2014 IEEE Information Theory Workshop (ITW 2014).

[11]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[12]  Martin E. Hellman,et al.  Probability of error, equivocation, and the Chernoff bound , 1970, IEEE Trans. Inf. Theory.

[13]  John N. Tsitsiklis,et al.  Introduction to linear optimization , 1997, Athena scientific optimization and computation series.

[14]  Neri Merhav,et al.  Relations between entropy and error probability , 1994, IEEE Trans. Inf. Theory.

[15]  Deniz Gündüz,et al.  Optimal Utility-Privacy Trade-off with the Total Variation Distance as the Privacy Measure , 2018, ArXiv.

[16]  Deniz Gündüz,et al.  On Perfect Privacy , 2017, 2018 IEEE International Symposium on Information Theory (ISIT).

[17]  Josep Domingo-Ferrer,et al.  From t-Closeness-Like Privacy to Postrandomization via Information Theory , 2010, IEEE Transactions on Knowledge and Data Engineering.