Optimal Multi-Source Inference Privacy — A Generalized Lloyd-Max Algorithm

Information sanitization to protect an underlying label from being inferred through multiple data sources is investigated in this work. The problem is posed as an optimal mapping from a set of underlying distributions that reveal classes/labels for the data to a target distribution with minimum distortion. The optimal sanitization operation are transformed to convex optimization problems corresponding to the domain of the source and target distributions. In particular, when the target distribution is discrete, a parallel is drawn to a “biased” quantization method and an efficient sub-gradient method is proposed to derive the optimal transformation. The method is extended to a scenario where multiple source continuous distributions are to be mapped to an unknown target discrete distribution. A generalized version of the classical Lloyd Max iterative algorithm is proposed to derive the optimal biased quantizers that achieve perfect inference privacy. A real time system is investigated where the sanitizer does not have apriori information about the source distribution save for the class of possible source distributions. In the real time framework, an algorithm is proposed that achieves asymptotically the same distortion as if the source distribution were known apriori.

[1]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[2]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[3]  Parv Venkitasubramaniam,et al.  Stealthy Attacks in Dynamical Systems: Tradeoffs Between Utility and Detectability With Application in Anonymous Systems , 2017, IEEE Transactions on Information Forensics and Security.

[4]  Naum Zuselevich Shor,et al.  Minimization Methods for Non-Differentiable Functions , 1985, Springer Series in Computational Mathematics.

[5]  Yingshu Li,et al.  Collective Data-Sanitization for Preventing Sensitive Information Inference Attacks in Social Networks , 2018, IEEE Transactions on Dependable and Secure Computing.

[6]  Patrick Schulte,et al.  Bandwidth Efficient and Rate-Compatible Low-Density Parity-Check Coded Modulation , 2015, ArXiv.

[7]  Ruochi Zhang,et al.  Optimal Distribution Mapping for Inference Privacy , 2018, 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[8]  Georg Böcherer,et al.  Fixed-to-variable length distribution matching , 2013, 2013 IEEE International Symposium on Information Theory.

[9]  Lili Ju,et al.  Nondegeneracy and Weak Global Convergence of the Lloyd Algorithm in Rd , 2008, SIAM J. Numer. Anal..

[10]  Anand D. Sarwate,et al.  Signal Processing and Machine Learning with Differential Privacy: Algorithms and Challenges for Continuous Data , 2013, IEEE Signal Processing Magazine.

[11]  Kamalika Chaudhuri,et al.  Privacy-preserving logistic regression , 2008, NIPS.

[12]  Parv Venkitasubramaniam,et al.  Admissible Length Study in Anonymous Networking: A Detection Theoretic Perspective , 2013, IEEE Journal on Selected Areas in Communications.

[13]  Takashi Tsuchiya,et al.  Affine Scaling Algorithm , 1996 .

[14]  Elisa Bertino,et al.  A Decentralized Privacy Preserving Reputation Protocol for the Malicious Adversarial Model , 2013, IEEE Transactions on Information Forensics and Security.

[15]  M. Sion On general minimax theorems , 1958 .

[16]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[17]  Ling Huang,et al.  Learning in a Large Function Space: Privacy-Preserving Mechanisms for SVM Learning , 2009, J. Priv. Confidentiality.

[18]  John A. Gubner,et al.  Distributed estimation and quantization , 1993, IEEE Trans. Inf. Theory.

[19]  Robert M. Gray,et al.  Global convergence and empirical consistency of the generalized Lloyd algorithm , 1986, IEEE Trans. Inf. Theory.

[20]  Patrick Schulte,et al.  Constant Composition Distribution Matching , 2015, IEEE Transactions on Information Theory.

[21]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[22]  Ruochi Zhang,et al.  Stealthy Control Signal Attacks in Linear Quadratic Gaussian Control Systems: Detectability Reward Tradeoff , 2017, IEEE Transactions on Information Forensics and Security.

[23]  Stanley Zionts,et al.  The Criss-Cross Method for Solving Linear Programming Problems , 1969 .

[24]  H. Vincent Poor,et al.  Utility-Privacy Tradeoffs in Databases: An Information-Theoretic Approach , 2011, IEEE Transactions on Information Forensics and Security.

[25]  Feng Yan,et al.  Distributed Autonomous Online Learning: Regrets and Intrinsic Privacy-Preserving Properties , 2010, IEEE Transactions on Knowledge and Data Engineering.

[26]  Ljiljana Brankovic,et al.  Noise Addition for Protecting Privacy in Data Mining , 2003 .

[27]  Qiang Du,et al.  Convergence of the Lloyd Algorithm for Computing Centroidal Voronoi Tessellations , 2006, SIAM J. Numer. Anal..

[28]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[29]  Bernard C. Levy,et al.  Principles of Signal Detection and Parameter Estimation , 2008 .

[30]  Vijay Gupta,et al.  On Kalman filtering in the presence of a compromised sensor: Fundamental performance bounds , 2014, 2014 American Control Conference.

[31]  Paul Scheunders,et al.  A genetic Lloyd-Max image quantization algorithm , 1996, Pattern Recognit. Lett..