Differential privacy for learning vector quantization

Abstract Prototype-based machine learning methods such as learning vector quantisation (LVQ) offer flexible classification tools, which represent a classification in terms of typical prototypes. This representation leads to a particularly intuitive classification scheme, since prototypes can be inspected by a human partner in the same way as data points. Yet, it bears the risk of revealing private information included in the training data, since individual information of a single training data point can significantly influence the location of a prototype. In this contribution, we investigate the question how to algorithmically extend LVQ such that it provably obeys privacy constraints as offered by the notion of so-called differential privacy. More precisely, we demonstrate the sensitivity of LVQ to single data points and hence the need of its extension to private variants in case of possibly sensitive training data. We investigate three technologies which have been proposed in the context of differential privacy, and we extend these technologies to LVQ schemes. We investigate the effectiveness and efficiency of these schemes for various data sets, and we evaluate their scalability and robustness as regards the choice of meta-parameters and characteristics of training sets. Interestingly, one algorithm, which has been proposed in the literature due to its beneficial mathematical properties, does not scale well with data dimensionality, while two alternative techniques, which are based on simpler principles, display good results in practical settings.

[1]  C. Dwork,et al.  Exposed! A Survey of Attacks on Private Data , 2017, Annual Review of Statistics and Its Application.

[2]  Thomas Villmann,et al.  Stationarity of Matrix Relevance LVQ , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[3]  Mikhail J. Atallah,et al.  Algorithms and Theory of Computation Handbook , 2009, Chapman & Hall/CRC Applied Algorithms and Data Structures series.

[4]  Seth Neel,et al.  Accuracy First: Selecting a Differential Privacy Level for Accuracy Constrained ERM , 2017, NIPS.

[5]  Frederik Armknecht,et al.  A Guide to Fully Homomorphic Encryption , 2015, IACR Cryptol. ePrint Arch..

[6]  Thomas Villmann,et al.  Border-sensitive learning in generalized learning vector quantization: an alternative to support vector machines , 2015, Soft Comput..

[7]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[8]  Thomas Villmann,et al.  Metric Learning for Prototype-Based Classification , 2009, Innovations in Neural Information Paradigms and Applications.

[9]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[10]  Michael Huth,et al.  Optimal Accuracy-Privacy Trade-Off for Secure Multi-Party Computations , 2018, ArXiv.

[11]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[12]  Michael Biehl,et al.  Adaptive Relevance Matrices in Learning Vector Quantization , 2009, Neural Computation.

[13]  Marcus Mueller,et al.  A Survey of the Application of Machine Learning in Decision Support Systems , 2015, ECIS.

[14]  Michael Biehl,et al.  Dynamics and Generalization Ability of LVQ Algorithms , 2007, J. Mach. Learn. Res..

[15]  Yin Yang,et al.  PrivGene: differentially private model fitting using genetic algorithms , 2013, SIGMOD '13.

[16]  Michael Biehl,et al.  Biomedical Applications of Prototype Based Classifiers and Relevance Learning , 2017, AlCoB.

[17]  Klaus Obermayer,et al.  Soft Learning Vector Quantization , 2003, Neural Computation.

[18]  Giuseppe Ateniese,et al.  Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning , 2017, CCS.

[19]  Charles Elkan,et al.  Differential Privacy and Machine Learning: a Survey and Review , 2014, ArXiv.

[20]  Heiko Wersing,et al.  Incremental on-line learning: A review and comparison of state of the art algorithms , 2018, Neurocomputing.

[21]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[22]  Barbara Hammer,et al.  Interpretable machine learning with reject option , 2018, Autom..

[23]  Wenliang Du,et al.  Secure multi-party computation problems and their applications: a review and open problems , 2001, NSPW '01.

[24]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[25]  Takashi Washio,et al.  Discriminative and Generative Models in Causal and Anticausal Settings , 2015, AMBN@JSAI-isAI.

[26]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[27]  Hans Ulrich Simon,et al.  Robust Trainability of Single Neurons , 1995, J. Comput. Syst. Sci..

[28]  Thomas Villmann,et al.  Prototype-based models in machine learning. , 2016, Wiley interdisciplinary reviews. Cognitive science.

[29]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[30]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[31]  Paulo Martins,et al.  A Survey on Fully Homomorphic Encryption , 2017, ACM Comput. Surv..

[32]  Thomas Villmann,et al.  Relevance LVQ versus SVM , 2004, ICAISC.

[33]  Maomi Ueno,et al.  Proceedings of the Second International Workshop on Advanced Methodologies for Bayesian Networks - Volume 9505 , 2015 .

[34]  Davide Anguita,et al.  A Public Domain Dataset for Human Activity Recognition using Smartphones , 2013, ESANN.

[35]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Deep learning for biological image classification , 2017, Expert Syst. Appl..

[36]  Thomas Villmann,et al.  Supervised Neural Gas with General Similarity Measure , 2005, Neural Processing Letters.

[37]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[38]  Yu-Xiang Wang,et al.  Improving the Gaussian Mechanism for Differential Privacy: Analytical Calibration and Optimal Denoising , 2018, ICML.

[39]  A. P. Dawid,et al.  Generative or Discriminative? Getting the Best of Both Worlds , 2007 .

[40]  Atsushi Sato,et al.  Generalized Learning Vector Quantization , 1995, NIPS.

[41]  J. Richard Dowell An overview of privacy and security requirements for data bases , 1977, ACM-SE 15.

[42]  Guang-Zhong Yang,et al.  Deep Learning for Health Informatics , 2017, IEEE Journal of Biomedical and Health Informatics.

[43]  Hans Ulrich Simon,et al.  Robust Trainability of Single Neurons , 1995, J. Comput. Syst. Sci..

[44]  Michael Biehl,et al.  Matrix relevance LVQ in steroid metabolomics based classification of adrenal tumors , 2012, ESANN.

[45]  Zachary C. Lipton,et al.  The mythos of model interpretability , 2018, Commun. ACM.