Privacy-Preserving Public Release of Datasets for Support Vector Machine Classification

We consider the problem of publicly releasing a dataset for support vector machine classification while not infringing on the privacy of data subjects (i.e., individuals whose private information is stored in the dataset). The dataset is systematically obfuscated using an additive noise for privacy protection. Motivated by the Cramer-Rao bound, inverse of the trace of the Fisher information matrix is used as a measure of the privacy. Conditions are established for ensuring that the classifier extracted from the original dataset and the obfuscated one are close to each other (capturing the utility). The optimal noise distribution is determined by maximizing a weighted sum of the measures of privacy and utility. The optimal privacy-preserving noise is proved to achieve local differential privacy. The results are generalized to a broader class of optimization-based supervised machine learning algorithms. Applicability of the methodology is demonstrated on multiple datasets.

[1]  Aleksandar Nikolov,et al.  The geometry of differential privacy: the sparse and approximate cases , 2012, STOC '13.

[2]  C. H. Edwards Advanced calculus of several variables , 1973 .

[3]  Yufei Tao,et al.  M-invariance: towards privacy preserving re-publication of dynamic datasets , 2007, SIGMOD '07.

[4]  John Odentrantz,et al.  Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues , 2000, Technometrics.

[5]  Carles Padró,et al.  Information Theoretic Security , 2013, Lecture Notes in Computer Science.

[6]  Michael S. Gashler,et al.  A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks , 2015, 2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K).

[7]  Yoshua Bengio,et al.  Série Scientifique Scientific Series Incorporating Second-order Functional Knowledge for Better Option Pricing Incorporating Second-order Functional Knowledge for Better Option Pricing , 2022 .

[8]  Lei Zou,et al.  K-Automorphism: A General Framework For Privacy Preserving Network Publication , 2009, Proc. VLDB Endow..

[9]  Yin Yang,et al.  Functional Mechanism: Regression Analysis under Differential Privacy , 2012, Proc. VLDB Endow..

[10]  R. Rockafellar,et al.  Implicit Functions and Solution Mappings: A View from Variational Analysis , 2009 .

[11]  Henrik Sandberg,et al.  Quadratic Gaussian privacy games , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[12]  Henrik Sandberg,et al.  Ensuring Privacy with Constrained Additive Noise by Minimizing Fisher Information , 2018, Autom..

[13]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[14]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[15]  Nitish D. Patel,et al.  SQNL: A New Computationally Efficient Activation Function , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[16]  Jun Tang,et al.  Privacy Loss in Apple's Implementation of Differential Privacy on MacOS 10.12 , 2017, ArXiv.

[17]  Anne-Sophie Charest,et al.  How Can We Analyze Differentially-Private Synthetic Datasets? , 2011, J. Priv. Confidentiality.

[18]  Bin Liu,et al.  PUPPIES: Transformation-Supported Personalized Privacy Preserving Partial Image Sharing , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[19]  Nico M. Temme The Airy functions , 2014 .

[20]  A. D. Wyner,et al.  The wire-tap channel , 1975, The Bell System Technical Journal.

[21]  Kamalika Chaudhuri,et al.  Privacy-preserving logistic regression , 2008, NIPS.

[22]  Xiaoqian Jiang,et al.  Privacy Preserving RBF Kernel Support Vector Machine , 2014, BioMed research international.

[23]  Jean-Pierre Hubaux,et al.  Consensual and Privacy-Preserving Sharing of Multi-Subject and Interdependent Data , 2018, NDSS.

[24]  James W. Daniel,et al.  Stability of the solution of definite quadratic programs , 1973, Math. Program..

[25]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[26]  Lars Vilhuber,et al.  How Protective Are Synthetic Data? , 2008, Privacy in Statistical Databases.

[27]  A.S. Mohamed,et al.  Separation of the Schrödinger operator with an operator potential in the Hilbert spaces , 2005 .

[28]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[29]  P. Maher,et al.  Handbook of Matrices , 1999, The Mathematical Gazette.

[30]  Kevin Gimpel,et al.  Gaussian Error Linear Units (GELUs) , 2016 .

[31]  Henrik Sandberg,et al.  Fisher Information as a Measure of Privacy: Preserving Privacy of Households With Smart Meters Using Batteries , 2018, IEEE Transactions on Smart Grid.

[32]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[33]  Hirosuke Yamamoto,et al.  A source coding problem for sources with additional outputs to keep secret from the receiver or wiretappers , 1983, IEEE Trans. Inf. Theory.

[34]  Ming-Syan Chen,et al.  Releasing the SVM Classifier with Privacy-Preservation , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[35]  Tamer Başar,et al.  Information-Theoretic Approach to Strategic Communication as a Hierarchical Game , 2015, Proceedings of the IEEE.

[36]  Farhad Farokhi,et al.  Development and Analysis of Deterministic Privacy-Preserving Policies Using Non- Stochastic Information Theory , 2018, IEEE Transactions on Information Forensics and Security.

[37]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[38]  Ling Huang,et al.  Learning in a Large Function Space: Privacy-Preserving Mechanisms for SVM Learning , 2009, J. Priv. Confidentiality.

[39]  Henrik Sandberg,et al.  Fisher Information Privacy with Application to Smart Meter Privacy Using HVAC Units , 2020 .