Understanding and Controlling User Linkability in Decentralized Learning

Machine Learning techniques are widely used by online services (e.g. Google, Apple) in order to analyze and make predictions on user data. As many of the provided services are user-centric (e.g. personal photo collections, speech recognition, personal assistance), user data generated on personal devices is key to provide the service. In order to protect the data and the privacy of the user, federated learning techniques have been proposed where the data never leaves the user's device and "only" model updates are communicated back to the server. In our work, we propose a new threat model that is not concerned with learning about the content - but rather is concerned with the linkability of users during such decentralized learning scenarios. We show that model updates are characteristic for users and therefore lend themselves to linkability attacks. We show identification and matching of users across devices in closed and open world scenarios. In our experiments, we find our attacks to be highly effective, achieving 20x-175x chance-level performance. In order to mitigate the risks of linkability attacks, we study various strategies. As adding random noise does not offer convincing operation points, we propose strategies based on using calibrated domain-specific data; we find these strategies offers substantial protection against linkability threats with little effect to utility.

[1]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[2]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, Allerton.

[3]  Xiaohui Zhang,et al.  Parallel training of Deep Neural Networks with Natural Gradient and Parameter Averaging , 2014, ICLR.

[4]  Silvio Lattanzi,et al.  An efficient reconciliation algorithm for social networks , 2013, Proc. VLDB Endow..

[5]  Hubert Eichner,et al.  Federated Learning for Mobile Keyboard Prediction , 2018, ArXiv.

[6]  Hubert Eichner,et al.  APPLIED FEDERATED LEARNING: IMPROVING GOOGLE KEYBOARD QUERY SUGGESTIONS , 2018, ArXiv.

[7]  Tianbao Yang,et al.  Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent , 2013, NIPS.

[8]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[9]  Sarvar Patel,et al.  Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..

[10]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[11]  Tassilo Klein,et al.  Differentially Private Federated Learning: A Client Level Perspective , 2017, ArXiv.

[12]  Apostol Natsev,et al.  YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.

[13]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[14]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[15]  Mario Fritz,et al.  ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models , 2018, NDSS.

[16]  Tom Ouyang,et al.  Federated Learning Of Out-Of-Vocabulary Words , 2019, ArXiv.

[17]  Tribhuvanesh Orekondy,et al.  Knockoff Nets: Stealing Functionality of Black-Box Models , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[19]  Latanya Sweeney,et al.  Guaranteeing anonymity when sharing medical data, the Datafly System , 1997, AMIA.

[20]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[21]  Hung Dang,et al.  Evading Classifiers by Morphing in the Dark , 2017, CCS.

[22]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[23]  Chen Sun,et al.  Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[25]  Seong Joon Oh,et al.  Adversarial Image Perturbation for Privacy Protection A Game Theory Perspective , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[27]  Krishna P. Gummadi,et al.  On the Reliability of Profile Matching Across Large Online Social Networks , 2015, KDD.

[28]  Wei Xu,et al.  CNN-RNN: A Unified Framework for Multi-label Image Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  James Demmel,et al.  ImageNet Training in Minutes , 2017, ICPP.

[30]  Martín Abadi,et al.  Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data , 2016, ICLR.

[31]  Gene Tsudik,et al.  Exploring Linkability of User Reviews , 2012, ESORICS.

[32]  Seong Joon Oh,et al.  Faceless Person Recognition: Privacy Implications in Social Media , 2016, ECCV.

[33]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Emiliano De Cristofaro,et al.  Knock Knock, Who's There? Membership Inference on Aggregate Location Data , 2017, NDSS.

[35]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[36]  Claude Castelluccia,et al.  How Unique and Traceable Are Usernames? , 2011, PETS.

[37]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[38]  Reza Shokri,et al.  Comprehensive Privacy Analysis of Deep Learning: Stand-alone and Federated Learning under Passive and Active White-box Inference Attacks , 2018, ArXiv.

[39]  Andreas Keller,et al.  Privacy in Epigenetics: Temporal Linkability of MicroRNA Expression Profiles , 2016, USENIX Security Symposium.

[40]  Ning Zhang,et al.  Beyond frontal faces: Improving Person Recognition using multiple cues , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Chi Lin,et al.  VizWiz-Priv: A Dataset for Recognizing the Presence and Purpose of Private Visual Information in Images Taken by Blind People , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Giuseppe Ateniese,et al.  Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning , 2017, CCS.

[43]  Michael Naehrig,et al.  ML Confidential: Machine Learning on Encrypted Data , 2012, ICISC.

[44]  A. Anonymous,et al.  Consumer Data Privacy in a Networked World: A Framework for Protecting Privacy and Promoting Innovation in the Global Digital Economy , 2013, J. Priv. Confidentiality.

[45]  Seung-won Hwang,et al.  SocialSearch: enhancing entity search with social network matching , 2011, EDBT/ICDT '11.

[46]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[47]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[48]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[49]  Peter Richtárik,et al.  Fast distributed coordinate descent for non-strongly convex losses , 2014, 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[50]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Abhinav Gupta,et al.  Learning from Noisy Large-Scale Datasets with Minimal Supervision , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Franco Zambonelli,et al.  Re-identification and information fusion between anonymized CDR and social network data , 2015, Journal of Ambient Intelligence and Humanized Computing.

[53]  Vitaly Shmatikov,et al.  Machine Learning Models that Remember Too Much , 2017, CCS.

[54]  Yoichi Sato,et al.  Privacy-Preserving Visual Learning Using Doubly Permuted Homomorphic Encryption , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[55]  Shlomo Argamon,et al.  Effects of Age and Gender on Blogging , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[56]  Zhiqiang Shen,et al.  DSOD: Learning Deeply Supervised Object Detectors from Scratch , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[57]  Tribhuvanesh Orekondy,et al.  Connecting Pixels to Privacy and Utility: Automatic Redaction of Private Information in Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[58]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[59]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[60]  Bernt Schiele,et al.  $A^{4}NT$: Author Attribute Anonymity by Adversarial Training of Neural Machine Translation , 2017, 1711.01921.

[61]  Peter Norvig,et al.  The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.

[62]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[63]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[64]  Michael I. Jordan,et al.  Adding vs. Averaging in Distributed Primal-Dual Optimization , 2015, ICML.

[65]  Ben Y. Zhao,et al.  With Great Training Comes Great Vulnerability: Practical Attacks against Transfer Learning , 2018, USENIX Security Symposium.

[66]  Tribhuvanesh Orekondy,et al.  Towards a Visual Privacy Advisor: Understanding and Predicting Privacy Risks in Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[67]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[68]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[69]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[70]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Peter Fankhauser,et al.  Identifying Users Across Social Tagging Systems , 2011, ICWSM.

[72]  Xiao Zhang,et al.  Cost-Sensitive Robustness against Adversarial Examples , 2018, ICLR.

[73]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[74]  Hannes Hartenstein,et al.  What Your Friends Tell Others About You: Low Cost Linkability of Social Network Profiles , 2011, SNAKDD 2011.

[75]  Qi Tian,et al.  Person Re-identification in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[77]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[78]  Shafi Goldwasser,et al.  Machine Learning Classification over Encrypted Data , 2015, NDSS.

[79]  Somesh Jha,et al.  Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures , 2015, CCS.

[80]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[81]  Sree Hari Krishnan Parthasarathi,et al.  Exploiting innocuous activity for correlating users across sites , 2013, WWW.

[82]  Bo Li,et al.  DBA: Distributed Backdoor Attacks against Federated Learning , 2020, ICLR.

[83]  Vitaly Shmatikov,et al.  How To Backdoor Federated Learning , 2018, AISTATS.

[84]  A. Yao,et al.  Fair exchange with a semi-trusted third party (extended abstract) , 1997, CCS '97.

[85]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[86]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[87]  Ameet Talwalkar,et al.  Federated Multi-Task Learning , 2017, NIPS.

[88]  Peter Richtárik,et al.  Federated Optimization: Distributed Machine Learning for On-Device Intelligence , 2016, ArXiv.

[89]  H. Brendan McMahan,et al.  Learning Differentially Private Recurrent Language Models , 2017, ICLR.

[90]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[91]  Gérard G. Medioni,et al.  Pose-Aware Face Recognition in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[92]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[93]  Sergio Guadarrama,et al.  Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[94]  Nikita Borisov,et al.  Property Inference Attacks on Fully Connected Neural Networks using Permutation Invariant Representations , 2018, CCS.

[95]  Michael L. Nelson,et al.  An Unsupervised Approach to Discovering and Disambiguating Social Media Profiles , 2011 .

[96]  Seong Joon Oh,et al.  Towards Reverse-Engineering Black-Box Neural Networks , 2017, ICLR.

[97]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[98]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[99]  Michael Naehrig,et al.  CryptoNets: applying neural networks to encrypted data with high throughput and accuracy , 2016, ICML 2016.

[100]  Christopher Meek,et al.  Adversarial learning , 2005, KDD '05.

[101]  Gideon S. Mann,et al.  Distributed Training Strategies for the Structured Perceptron , 2010, NAACL.

[102]  J. Robins,et al.  A Structural Approach to Selection Bias , 2004, Epidemiology.

[103]  R. Berk An introduction to sample selection bias in sociological data. , 1983 .