Optimal Distribution Mapping for Inference Privacy

Information sanitization to protect an underlying label from being inferred through a data stream is investigated in this work. The problem is posed as an optimal mapping from an underlying distribution that reveals a class/label for the data to a target distribution with minimum distortion. The optimal sanitization operation are transformed to convex optimization problems corresponding to the domain of the source and target distributions. In particular, when one of the distributions is discrete, a parallel is drawn to a biased quantization method and an efficient sub-gradient method is proposed to derive the optimal transformation. The method is extended to a real time scenario when multiple source distributions are to be mapped to a fixed target distribution without prior knowledge of the label of the streaming data, in order to defeat any hypothesis test between the labels. It is shown that even when the source label is unknown to the sanitizer, optimal distortion is possible with perfect privacy.

[1]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[2]  Parv Venkitasubramaniam,et al.  Admissible Length Study in Anonymous Networking: A Detection Theoretic Perspective , 2013, IEEE Journal on Selected Areas in Communications.

[3]  M. Sion On general minimax theorems , 1958 .

[4]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[5]  Ling Huang,et al.  Learning in a Large Function Space: Privacy-Preserving Mechanisms for SVM Learning , 2009, J. Priv. Confidentiality.

[6]  Georg Böcherer,et al.  Fixed-to-variable length distribution matching , 2013, 2013 IEEE International Symposium on Information Theory.

[7]  Naum Zuselevich Shor,et al.  Minimization Methods for Non-Differentiable Functions , 1985, Springer Series in Computational Mathematics.

[8]  Patrick Schulte,et al.  Bandwidth Efficient and Rate-Compatible Low-Density Parity-Check Coded Modulation , 2015, ArXiv.

[9]  Elisa Bertino,et al.  A Decentralized Privacy Preserving Reputation Protocol for the Malicious Adversarial Model , 2013, IEEE Transactions on Information Forensics and Security.

[10]  Bernard C. Levy,et al.  Principles of Signal Detection and Parameter Estimation , 2008 .

[11]  Kamalika Chaudhuri,et al.  Privacy-preserving logistic regression , 2008, NIPS.

[12]  Vijay Gupta,et al.  On Kalman filtering in the presence of a compromised sensor: Fundamental performance bounds , 2014, 2014 American Control Conference.

[13]  Stanley Zionts,et al.  The Criss-Cross Method for Solving Linear Programming Problems , 1969 .

[14]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[15]  Ljiljana Brankovic,et al.  Noise Addition for Protecting Privacy in Data Mining , 2003 .

[16]  H. Vincent Poor,et al.  Utility-Privacy Tradeoffs in Databases: An Information-Theoretic Approach , 2011, IEEE Transactions on Information Forensics and Security.

[17]  Feng Yan,et al.  Distributed Autonomous Online Learning: Regrets and Intrinsic Privacy-Preserving Properties , 2010, IEEE Transactions on Knowledge and Data Engineering.

[18]  Anand D. Sarwate,et al.  Signal Processing and Machine Learning with Differential Privacy: Algorithms and Challenges for Continuous Data , 2013, IEEE Signal Processing Magazine.

[19]  Takashi Tsuchiya,et al.  Affine Scaling Algorithm , 1996 .

[20]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[21]  Parv Venkitasubramaniam,et al.  Stealthy Attacks in Dynamical Systems: Tradeoffs Between Utility and Detectability With Application in Anonymous Systems , 2017, IEEE Transactions on Information Forensics and Security.

[22]  Eric B. Weiser,et al.  Gender Differences in Internet Use Patterns and Internet Application Preferences: A Two-Sample Comparison , 2000, Cyberpsychology Behav. Soc. Netw..

[23]  Yingshu Li,et al.  Collective Data-Sanitization for Preventing Sensitive Information Inference Attacks in Social Networks , 2018, IEEE Transactions on Dependable and Secure Computing.

[24]  Patrick Schulte,et al.  Constant Composition Distribution Matching , 2015, IEEE Transactions on Information Theory.

[25]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[26]  Ruochi Zhang,et al.  Stealthy Control Signal Attacks in Linear Quadratic Gaussian Control Systems: Detectability Reward Tradeoff , 2017, IEEE Transactions on Information Forensics and Security.