You Are How You Move: Linking Multiple User Identities From Massive Mobility Traces

Understanding the linkability of online user identifiers (IDs) is critical to both service providers (for business intelligence) and individual users (for assessing privacy risks). Existing methods are designed to match IDs across two services, but face key challenges of matching multiple services in practice, particularly when users have multiple IDs per service. In this paper, we propose a novel system to link IDs across multiple services by exploring the spatialtemporal locality of user activities. The core idea is that the same user’s online IDs are more likely to repeatedly appear at the same location. Specifically, we first utilize a contact graph to capture the “co-location” of all IDs across multiple services. Based on this graph, we propose a set-wise matching algorithm to discover candidate ID sets, and use Bayesian inference to generate confidence scores for candidate ranking, which is proved to be optimal. We evaluate our system using two real-world ground-truth datasets from an ISP (4 services, 815K IDs) and TwitterFoursquare (2 services, 770 IDs). Extensive results show that our system significantly outperforms the state-of-theart algorithms in accuracy (AUC is higher by 0.1-0.2), and it is highly robust against matching order and number of

[1]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[2]  Reza Zafarani,et al.  Connecting users across social media sites: a behavioral-modeling approach , 2013, KDD.

[3]  Sree Hari Krishnan Parthasarathi,et al.  Exploiting innocuous activity for correlating users across sites , 2013, WWW.

[4]  Jing Xiao,et al.  User Identity Linkage by Latent User Space Modelling , 2016, KDD.

[5]  Mirco Musolesi,et al.  It's the way you check-in: identifying users in location-based social networks , 2014, COSN '14.

[6]  David Wetherall,et al.  Towards IP geolocation using delay and topology measurements , 2006, IMC '06.

[7]  Silvio Lattanzi,et al.  An efficient reconciliation algorithm for social networks , 2013, Proc. VLDB Endow..

[8]  Reza Zafarani,et al.  User Identity Linkage across Online Social Networks: A Review , 2017, SKDD.

[9]  Philip S. Yu,et al.  Transferring heterogeneous links across location-based social networks , 2014, WSDM.

[10]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[11]  Donghan Yu,et al.  Multi-Site User Behavior Modeling and Its Application in Video Recommendation , 2017, IEEE Transactions on Knowledge and Data Engineering.

[12]  Calton Pu,et al.  Modeling Unintended Personal-Information Leakage from Multiple Online Social Networks , 2011, IEEE Internet Computing.

[13]  Ramayya Krishnan,et al.  HYDRA: large-scale social identity linkage via heterogeneous behavior modeling , 2014, SIGMOD Conference.

[14]  Aleksandar Kuzmanovic,et al.  Measuring serendipity: connecting people, locations and interests in a mobile 3G network , 2009, IMC '09.

[15]  Silvio Lattanzi,et al.  Linking Users Across Domains with Location Data: Theory and Validation , 2016, WWW.

[16]  Reza Zafarani,et al.  Understanding User Migration Patterns in Social Media , 2011, AAAI.

[17]  Changsheng Xu,et al.  Friend transfer: Cold-start friend recommendation with cross-platform transfer learning of social knowledge , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[18]  Krishna P. Gummadi,et al.  On the Reliability of Profile Matching Across Large Online Social Networks , 2015, KDD.

[19]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).