User Identity Linkage via Co-Attentive Neural Network From Heterogeneous Mobility Data

Online services are playing critical roles in almost all aspects of users' life. Users usually have multiple online identities (IDs) in different online services. In order to fuse the separated user data in multiple services for better business intelligence, it is critical for service providers to link online IDs belonging to the same user. On the other hand, the popularity of mobile networks and GPS-equipped smart devices have provided a generic way to link IDs, i.e., utilizing the mobility traces of IDs. However, linking IDs based on their mobility traces has been a challenging problem due to the highly heterogeneous, incomplete and noisy mobility data across services. In this paper, we propose DPLink, an end-to-end deep learning based framework, to complete the user identity linkage task for heterogeneous mobility data collected from different services with different properties. DPLink is made up by a feature extractor including a location encoder and a trajectory encoder to extract representative features from trajectory and a comparator to compare and decide whether to link two trajectories as the same user. Particularly, we propose a pre-training strategy with a simple task to train the DPLink model to overcome the training difficulties introduced by the highly heterogeneous nature of different source mobility data. Besides, we introduce a multi-modal embedding network and a co-attention mechanism in DPLink to deal with the low-quality problem of mobility data. By conducting extensive experiments on two real-life ground-truth mobility datasets with eight baselines, we demonstrate that DPLink outperforms the state-of-the-art solutions by more than 15% in terms of hit-precision. Moreover, it is expandable to add external geographical context data and works stably with heterogeneous noisy mobility traces.

[1]  Qiang Gao,et al.  Identifying Human Mobility via Trajectory Embeddings , 2017, IJCAI.

[2]  David K. Y. Yau,et al.  Privacy vulnerability of published anonymous mobility traces , 2010, MobiCom.

[3]  Zhi-Li Zhang,et al.  From Fingerprint to Footprint: Revealing Physical World Privacy Leakage by Cyberspace Cookie Logs , 2017, CIKM.

[4]  Gang Wang,et al.  "Will Check-in for Badges": Understanding Bias and Misbehavior on Location-Based Social Networks , 2021, ICWSM.

[5]  Silvio Lattanzi,et al.  Linking Users Across Domains with Location Data: Theory and Validation , 2016, WWW.

[6]  Changsheng Xu,et al.  Friend transfer: Cold-start friend recommendation with cross-platform transfer learning of social knowledge , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[7]  Jean-Yves Le Boudec,et al.  Quantifying Location Privacy , 2011, 2011 IEEE Symposium on Security and Privacy.

[8]  Martin Vetterli,et al.  Where You Are Is Who You Are: User Identification by Matching Statistics , 2015, IEEE Transactions on Information Forensics and Security.

[9]  Chao Zhang,et al.  Trajectory clustering via deep representation learning , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[10]  Reza Zafarani,et al.  Finding Friends on a New Site Using Minimum Information , 2014, SDM.

[11]  Franco Zambonelli,et al.  Re-identification and information fusion between anonymized CDR and social network data , 2015, Journal of Ambient Intelligence and Humanized Computing.

[12]  Yong Li,et al.  DPLink: User Identity Linkage via Deep Neural Network From Heterogeneous Mobility Data , 2019, WWW.

[13]  Michael Hicks,et al.  Deanonymizing mobility traces: using social network as a side-channel , 2012, CCS.

[14]  Reza Zafarani,et al.  Understanding User Migration Patterns in Social Media , 2011, AAAI.

[15]  Xiaoming Fu,et al.  Trajectory Recovery From Ash: User Privacy Is NOT Preserved in Aggregated Mobility Data , 2017, WWW.

[16]  Luming Zhang,et al.  GMove: Group-Level Mobility Modeling Using Geo-Tagged Social Media , 2016, KDD.

[17]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[18]  Philip S. Yu,et al.  Transferring heterogeneous links across location-based social networks , 2014, WSDM.

[19]  Wei Cao,et al.  Automatic user identification method across heterogeneous mobility data sources , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[20]  Gang Wang,et al.  You Are How You Move: Linking Multiple User Identities From Massive Mobility Traces , 2018, SDM.

[21]  Prateek Mittal,et al.  On Your Social Network De-anonymizablity: Quantification and Large Scale Evaluation with Seed Knowledge , 2015, NDSS.

[22]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[23]  David Wetherall,et al.  Towards IP geolocation using delay and topology measurements , 2006, IMC '06.

[24]  Albert-László Barabási,et al.  Understanding individual human mobility patterns , 2008, Nature.

[25]  Albert-László Barabási,et al.  Limits of Predictability in Human Mobility , 2010, Science.

[26]  Chao Zhang,et al.  DeepMove: Predicting Human Mobility with Attentional Recurrent Networks , 2018, WWW.

[27]  Sree Hari Krishnan Parthasarathi,et al.  Exploiting innocuous activity for correlating users across sites , 2013, WWW.

[28]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[29]  Tieniu Tan,et al.  Predicting the Next Location: A Recurrent Model with Spatial and Temporal Contexts , 2016, AAAI.

[30]  Ben Y. Zhao,et al.  Understanding Motivations behind Inaccurate Check-ins , 2018, Proc. ACM Hum. Comput. Interact..

[31]  Silvio Lattanzi,et al.  An efficient reconciliation algorithm for social networks , 2013, Proc. VLDB Endow..

[32]  Wei Chen,et al.  Effective and Efficient User Account Linkage across Location Based Social Networks , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[33]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[34]  Shouling Ji,et al.  Structural Data De-anonymization: Quantification, Practice, and Implications , 2014, CCS.

[35]  Christian S. Jensen,et al.  Deep Representation Learning for Trajectory Similarity Computation , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[36]  Krishna P. Gummadi,et al.  On the Reliability of Profile Matching Across Large Online Social Networks , 2015, KDD.

[37]  Kai Zhao,et al.  Protecting Trajectory From Semantic Attack Considering ${k}$ -Anonymity, ${l}$ -Diversity, and ${t}$ -Closeness , 2019, IEEE Trans. Netw. Serv. Manag..

[38]  Donghan Yu,et al.  Multi-Site User Behavior Modeling and Its Application in Video Recommendation , 2017, IEEE Transactions on Knowledge and Data Engineering.

[39]  Mirco Musolesi,et al.  It's the way you check-in: identifying users in location-based social networks , 2014, COSN '14.

[40]  Yong Li,et al.  DeepSTN+: Context-Aware Spatial-Temporal Neural Network for Crowd Flow Prediction in Metropolis , 2019, AAAI.

[41]  Nehal Magdy,et al.  Review on trajectory similarity measures , 2015, 2015 IEEE Seventh International Conference on Intelligent Computing and Information Systems (ICICIS).