Privacy-Preserving Personal Model Training

Many current Internet services rely on inferences from models trained on user data. Commonly, both the training and inference tasks are carried out using cloud resources fed by personal data collected at scale from users. Holding and using such large collections of personal data in the cloud creates privacy risks to the data subjects, but is currently required for users to benefit from such services. We explore how to provide for model training and inference in a system where computation is pushed to the data in preference to moving data to the cloud, obviating many current privacy risks. Specifically, we take an initial model learnt from a small set of users and retrain it locally using data from a single user. We evaluate on two tasks: one supervised learning task, using a neural network to recognise users' current activity from accelerometer traces; and one unsupervised learning task, identifying topics in a large set of documents. In both cases the accuracy is improved. We also analyse the robustness of our approach against adversarial attacks, as well as its feasibility by presenting a performance evaluation on a representative resource-constrained device (a Raspberry Pi).

[1]  Michael T. Goodrich,et al.  A bridging model for parallel computation, communication, and I/O , 1996, CSUR.

[2]  C. Campagnoli,et al.  Misplaced confidences? , 1996, Fertility and sterility.

[3]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[4]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[5]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[6]  Philip S. Yu,et al.  A General Survey of Privacy-Preserving Data Mining Models and Algorithms , 2008, Privacy-Preserving Data Mining.

[7]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[8]  Wei Liu,et al.  Scalable similarity search with optimized kernel hashing , 2010, KDD.

[9]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[10]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[11]  Blaine Nelson,et al.  The security of machine learning , 2010, Machine Learning.

[12]  Yasin Abbasi-Yadkori,et al.  Fast Approximate Nearest-Neighbor Search with k-Nearest Neighbor Graph , 2011, IJCAI.

[13]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[14]  Blaine Nelson,et al.  Adversarial machine learning , 2019, AISec '11.

[15]  Gary M. Weiss,et al.  Activity recognition using cell phone accelerometers , 2011, SKDD.

[16]  Claudia Eckert,et al.  Adversarial Label Flips Attack on Support Vector Machines , 2012, ECAI.

[17]  John C. Duchi,et al.  Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[18]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[19]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[20]  Gary M. Weiss,et al.  The Impact of Personalization on Smartphone-Based Activity Recognition , 2012, AAAI 2012.

[21]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[22]  Seunghak Lee,et al.  More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.

[23]  Fernando Pérez-González,et al.  Privacy-preserving data aggregation in smart metering systems: an overview , 2013, IEEE Signal Processing Magazine.

[24]  Anand D. Sarwate,et al.  Signal Processing and Machine Learning with Differential Privacy: Algorithms and Challenges for Continuous Data , 2013, IEEE Signal Processing Magazine.

[25]  Alexander J. Smola,et al.  Communication Efficient Distributed Machine Learning with the Parameter Server , 2014, NIPS.

[26]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[27]  Somesh Jha,et al.  Privacy in Pharmacogenetics: An End-to-End Case Study of Personalized Warfarin Dosing , 2014, USENIX Security Symposium.

[28]  Cecilia Mascolo,et al.  DSP.Ear: leveraging co-processor support for continuous audio sensing on smartphones , 2014, SenSys.

[29]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[30]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[31]  Hamed Haddadi,et al.  Personal Data: Thinking Inside the Box , 2015, Aarhus Conference on Critical Alternatives.

[32]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  Bernardo A. Huberman,et al.  Securing private data sharing in multi-party analytics , 2016, First Monday.

[34]  Jussi Kangasharju,et al.  Kvasir: Scalable Provision of Semantically Relevant Web Content on Big Data Framework , 2016, IEEE Transactions on Big Data.

[35]  Mikhail Belkin,et al.  Learning privately from multiparty data , 2016, ICML.

[36]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[37]  Steve Uhlig,et al.  Tracking Personal Identifiers Across the Web , 2016, PAM.

[38]  Sotiris K. Tasoulis,et al.  Fast nearest neighbor search through sparse random projections and voting , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[39]  Qi Li,et al.  Personal Data Management with the Databox: What's Inside the Box? , 2016, CAN@CoNEXT.

[40]  Weisong Shi,et al.  Edge Computing: Vision and Challenges , 2016, IEEE Internet of Things Journal.

[41]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[42]  Cecilia Mascolo,et al.  LEO: scheduling sensor inference algorithms across heterogeneous mobile processors and network resources , 2016, MobiCom.

[43]  Richard Mortier,et al.  Probabilistic Synchronous Parallel , 2017, ArXiv.

[44]  Vivek Tiwari,et al.  PrIA: A Private Intelligent Assistant , 2017, HotMobile.

[45]  Giuseppe Ateniese,et al.  Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning , 2017, CCS.

[46]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[47]  Liang Wang Owl: A General-Purpose Numerical Library in OCaml , 2017, ArXiv.

[48]  Martín Abadi,et al.  Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data , 2016, ICLR.

[49]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[50]  R. Hardwarsing Stochastic Gradient Descent with Differentially Private Updates , 2018 .

[51]  Raja Lavanya,et al.  Fog Computing and Its Role in the Internet of Things , 2019, Advances in Computer and Electrical Engineering.