User identification in cyber-physical space: a case study on mobile query logs and trajectories

User identification across domains draws lots of research effort in recent years. Although most of existing works focus on user identification in a single space, in this paper, we first try to identify users by fusing their activities in cyber space and physical space, which helps us obtain a comprehensive understanding about users' online behaviours as well as offline visitation. Out profound insight to tackle this problem is that we can build a connection between the cyber space and the physical space with the stable location distribution of IP addresses. Thus, we propose a novel framework for user identification in cyber-physical space, which consists of three key steps: 1) modeling the location distribution of each IP address; 2) computing the co-occurrence with an inverted index to reduce the space and time cost; and 3) a learning-to-rank tactic to fuse user's features shared in both spaces to improve the accuracy. We conduct experiments to identify individual users from mobile query logs (generated in cyber space) and trajectory data (generated in physical space) to demonstrate the efficiency and effectiveness of our framework.

[1]  Erik G. Hoel,et al.  Spatial indexing and analytics on Hadoop , 2014, SIGSPATIAL/GIS.

[2]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[3]  Qing He,et al.  From Online Behaviors to Offline Retailing , 2016, KDD.

[4]  Martin Werner BACR: set similarities with lower bounds and application to spatial trajectories , 2015, SIGSPATIAL/GIS.

[5]  Mirco Musolesi,et al.  Spatio-temporal techniques for user identification by means of GPS mobility data , 2015, EPJ Data Science.

[6]  Hui Zang,et al.  Anonymization of location data does not work: a large-scale measurement study , 2011, MobiCom.

[7]  Chun Chen,et al.  Mapping Users across Networks by Manifold Alignment on Hypergraph , 2014, AAAI.

[8]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[9]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[10]  Xing Xie,et al.  Mining user similarity based on location history , 2008, GIS '08.

[11]  Christos Faloutsos,et al.  V-SMART-Join: A Scalable MapReduce Framework for All-Pair Similarity Joins of Multisets and Vectors , 2012, Proc. VLDB Endow..

[12]  Heng Tao Shen,et al.  Searching trajectories by locations: an efficiency study , 2010, SIGMOD Conference.

[13]  Roberto J. Bayardo,et al.  Scaling up all pairs similarity search , 2007, WWW '07.

[14]  César A. Hidalgo,et al.  Unique in the Crowd: The privacy bounds of human mobility , 2013, Scientific Reports.

[15]  Krishna P. Gummadi,et al.  On the Reliability of Profile Matching Across Large Online Social Networks , 2015, KDD.

[16]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[17]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[18]  Ramayya Krishnan,et al.  HYDRA: large-scale social identity linkage via heterogeneous behavior modeling , 2014, SIGMOD Conference.

[19]  Wei Cao,et al.  Automatic user identification method across heterogeneous mobility data sources , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[20]  Xiaoping Zhou,et al.  Cross-Platform Identification of Anonymous Identical Users in Multiple Social Media Networks , 2016, IEEE Transactions on Knowledge and Data Engineering.

[21]  Lei Chen,et al.  On The Marriage of Lp-norms and Edit Distance , 2004, VLDB.

[22]  Vincent Y. Shen,et al.  User identification across multiple social networks , 2009, 2009 First International Conference on Networked Digital Technologies.

[23]  Philip S. Yu,et al.  Inferring anchor links across multiple heterogeneous social networks , 2013, CIKM.