Differential Privacy-Preserving User Linkage across Online Social Networks

Many people maintain accounts at multiple online social networks (OSNs). Multi-OSN user linkage seeks to link the same person’s web profiles and integrate his/her data across different OSNs. It has been widely recognized as the key enabler for many important network applications. User linkage is unfortunately accompanied by growing privacy concerns about real identity leakage and the disclosure of sensitive user attributes. This paper initiates the study on privacy-preserving user linkage across multiple OSNs. We consider a social data collector (SDC) which collects perturbed user data from multiple OSNs and then performs user linkage for commercial data applications. To ensure strong user privacy, we introduce two novel differential privacy notions, ϵ-attribute indistinguishability and ϵ-profile indistinguishability, which ensure that any two users’ similar attributes and profiles cannot be distinguished after perturbation. We then present a novel Multivariate Laplace Mechanism (MLM) to achieve ϵ-attribute indistinguishability and ϵ-profile indistinguishability. We finally propose a novel differential privacy-preserving user linkage framework in which the SDC trains a classifier for user linkage across different OSNs. Extensive experimental studies based on three real datasets confirm the efficacy of our proposed framework.

[1]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[2]  Samuel Kotz,et al.  The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance , 2001 .

[3]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[4]  Kobbi Nissim,et al.  Clustering Algorithms for the Centralized and Local Models , 2017, ALT.

[5]  Tat-Seng Chua,et al.  Towards User Personality Profiling from Multiple Social Networks , 2017, AAAI.

[6]  Caitlin Sadowski SimHash : Hash-based Similarity Detection , 2007 .

[7]  Rui Zhang,et al.  PriStream: Privacy-preserving distributed stream monitoring of thresholded PERCENTILE statistics , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[8]  Stefan Palan,et al.  Prolific.ac—A subject pool for online experiments , 2017 .

[9]  Patrick Loiseau,et al.  Identity vs. Attribute Disclosure Risks for Users with Multiple Social Profiles , 2017, ASONAM.

[10]  Huan Liu,et al.  Personalized Privacy-Preserving Social Recommendation , 2018, AAAI.

[11]  Philip S. Yu,et al.  COSNET: Connecting Heterogeneous Social Networks with Local and Global Consistency , 2015, KDD.

[12]  Xiaocong Jin,et al.  Privacy-preserving crowdsourced spectrum sensing , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[13]  Martin J. Wainwright,et al.  Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[14]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[15]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[16]  Krishna P. Gummadi,et al.  On the Reliability of Profile Matching Across Large Online Social Networks , 2015, KDD.

[17]  Yin Yang,et al.  PrivTrie: Effective Frequent Term Discovery under Local Differential Privacy , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[18]  Martin J. Wainwright,et al.  Minimax Optimal Procedures for Locally Private Estimation , 2016, ArXiv.

[19]  Virgílio A. F. Almeida,et al.  Studying User Footprints in Different Online Social Networks , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[20]  Catuscia Palamidessi,et al.  Geo-indistinguishability: differential privacy for location-based systems , 2012, CCS.

[21]  Pramod Viswanath,et al.  Extremal Mechanisms for Local Differential Privacy , 2014, J. Mach. Learn. Res..

[22]  Raef Bassily,et al.  Local, Private, Efficient Protocols for Succinct Histograms , 2015, STOC.

[23]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[24]  M. Shamim Hossain,et al.  A Unified Video Recommendation by Cross-Network User Modeling , 2016, ACM Trans. Multim. Comput. Commun. Appl..

[25]  Ninghui Li,et al.  Locally Differentially Private Frequent Itemset Mining , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[26]  Raef Bassily,et al.  Practical Locally Private Heavy Hitters , 2017, NIPS.

[27]  Úlfar Erlingsson,et al.  Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries , 2015, Proc. Priv. Enhancing Technol..

[28]  Divesh Srivastava,et al.  Marginal Release Under Local Differential Privacy , 2017, SIGMOD Conference.

[29]  Xiang Cheng,et al.  Differentially private multi-party high-dimensional data publishing , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[30]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[31]  Ge Yu,et al.  Collecting and Analyzing Multidimensional Data with Local Differential Privacy , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[32]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[33]  Claude Castelluccia,et al.  How Unique and Traceable Are Usernames? , 2011, PETS.

[34]  Yanchao Zhang,et al.  Privacy-Preserving Social Media Data Outsourcing , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[35]  Roksana Boreli,et al.  Is more always merrier?: a deep dive into online social footprints , 2012, WOSN '12.

[36]  Ming Li,et al.  PCKV: Locally Differentially Private Correlated Key-Value Data Collection with Optimized Utility , 2019, USENIX Security Symposium.

[37]  Tao Li,et al.  DPSense: Differentially Private Crowdsourced Spectrum Sensing , 2016, CCS.

[38]  Philip S. Yu,et al.  Meta-path based multi-network collective link prediction , 2014, KDD.

[39]  Xintao Wu,et al.  Regression Model Fitting under Differential Privacy and Model Inversion Attack , 2015, IJCAI.