Data Privacy and Utility Trade-Off Based on Mutual Information Neural Estimator

In the era of big data and the Internet of Things (IoT), data owners need to share a large amount of data with the intended receivers in an insecure environment, posing a tradeoff issue between user privacy and data utility. The privacy utility trade-off was facilitated through a privacy funnel based on mutual information. Nevertheless, it is challenging to characterize the mutual information accurately with small sample size or unknown distribution functions. In this article, we propose a privacy funnel based on mutual information neural estimator (MINE) to optimize the privacy utility trade-off by estimating mutual information. Instead of computing mutual information in traditional way, we estimate it using an MINE, which obtains the estimated mutual information in a trained way, ensuring that the estimation results are as precise as possible. We employ estimated mutual information as a measure of privacy and utility, and then form a problem to optimize data utility by training a neural network while the estimator’s privacy discourse is less than a threshold. The simulation results also demonstrated that the estimated mutual information from MINE works very well to approximate the mutual information even with a limited number of samples to quantify privacy leakage and data utility retention, as well as optimize the privacy utility trade-off.

[1]  Fraser,et al.  Independent coordinates for strange attractors from mutual information. , 1986, Physical review. A, General physics.

[2]  Yochai Blau,et al.  Direct Validation of the Information Bottleneck Principle for Deep Nets , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[3]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[4]  Mohamed Ali Moussa,et al.  Privacy Preserving Utility-Aware Mechanism for Data Uploading Phase in Participatory Sensing , 2019, IEEE Transactions on Mobile Computing.

[5]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[6]  S. Varadhan,et al.  Asymptotic evaluation of certain Markov process expectations for large time , 1975 .

[7]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[8]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[9]  Domingo-FerrerJosep,et al.  Enhancing data utility in differential privacy via microaggregation-based k-anonymity , 2014, VLDB 2014.

[10]  Josep Domingo-Ferrer,et al.  Database Anonymization: Privacy Models, Data Utility, and Microaggregation-based Inter-model Connections , 2016, Database Anonymization.

[11]  Flávio du Pin Calmon,et al.  Privacy against statistical inference , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[12]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[13]  Gerhard Wunder,et al.  Deep Learning for Channel Coding via Neural Mutual Information Estimation , 2019, 2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC).

[14]  Takayuki Okatani,et al.  Information Potential Auto-Encoders , 2017, ArXiv.

[15]  Ke Xu,et al.  Cleaning the Null Space: A Privacy Mechanism for Predictors , 2017, AAAI.

[16]  Aaron C. Courville,et al.  MINE: Mutual Information Neural Estimation , 2018, ArXiv.

[17]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[19]  Igor Vajda,et al.  Estimation of the Information by an Adaptive Partitioning of the Observation Space , 1999, IEEE Trans. Inf. Theory.

[20]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[21]  Pramod Viswanath,et al.  Demystifying fixed k-nearest neighbor information estimators , 2016, 2017 IEEE International Symposium on Information Theory (ISIT).

[22]  Muriel Médard,et al.  From the Information Bottleneck to the Privacy Funnel , 2014, 2014 IEEE Information Theory Workshop (ITW 2014).

[23]  B. Rinner,et al.  Privacy Protection vs . Utility in Visual Data An Objective Evaluation Framework , 2017 .

[24]  M. Adams,et al.  Big Data and Individual Privacy in the Age of the Internet of Things , 2017 .

[25]  Weizhu Qian,et al.  Learning Robust Variational Information Bottleneck with Reference , 2021, ArXiv.

[26]  Takafumi Kanamori,et al.  Approximating Mutual Information by Maximum Likelihood Density Ratio Estimation , 2008, FSDM.

[27]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[28]  Hirosuke Yamamoto,et al.  A source coding problem for sources with additional outputs to keep secret from the receiver or wiretappers , 1983, IEEE Trans. Inf. Theory.

[29]  David Barber,et al.  The IM algorithm: a variational approach to Information Maximization , 2003, NIPS 2003.