EBM: an entropy-based model to infer social strength from spatiotemporal data

The ubiquity of mobile devices and the popularity of location-based-services have generated, for the first time, rich datasets of people's location information at a very high fidelity. These location datasets can be used to study people's behavior - for example, social studies have shown that people, who are seen together frequently at the same place and at the same time, are most probably socially related. In this paper, we are interested in inferring these social connections by analyzing people's location information, which is useful in a variety of application domains from sales and marketing to intelligence analysis. In particular, we propose an entropy-based model (EBM) that not only infers social connections but also estimates the strength of social connections by analyzing people's co-occurrences in space and time. We examine two independent ways: diversity and weighted frequency, through which co-occurrences contribute to social strength. In addition, we take the characteristics of each location into consideration in order to compensate for cases where only limited location information is available. We conducted extensive sets of experiments with real-world datasets including both people's location data and their social connections, where we used the latter as the ground-truth to verify the results of applying our approach to the former. We show that our approach outperforms the competitors.

[1]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[2]  Xing Xie,et al.  Mining user similarity based on location history , 2008, GIS '08.

[3]  Dan Cosley,et al.  Inferring social ties from geographic coincidences , 2010, Proceedings of the National Academy of Sciences.

[4]  Aniket Kittur,et al.  Bridging the gap between physical location and online social networks , 2010, UbiComp.

[5]  Cyrus Shahabi,et al.  Towards integrating real-world spatiotemporal data with social networks , 2011, GIS.

[6]  Hanna Tuomisto,et al.  A diversity of beta diversities: straightening up a concept gone awry. Part 2. Quantifying beta diversity and related phenomena , 2010 .

[7]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[8]  C. Moreno,et al.  A consistent terminology for quantifying species diversity? , 2010, Oecologia.

[9]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[10]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[11]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[12]  M. Hill Diversity and Evenness: A Unifying Notation and Its Consequences , 1973 .

[13]  A. Rényi On Measures of Entropy and Information , 1961 .

[14]  L. Jost Entropy and diversity , 2006 .

[15]  C. J. Adkins An introduction to thermal physics , 1987 .

[16]  David Lazer,et al.  Inferring friendship network structure by using mobile phone data , 2009, Proceedings of the National Academy of Sciences.

[17]  Panos Kalnis,et al.  Indexing spatio-temporal data warehouses , 2002, Proceedings 18th International Conference on Data Engineering.

[18]  Jure Leskovec,et al.  Friendship and mobility: user movement in location-based social networks , 2011, KDD.

[19]  Hanan Samet,et al.  The Quadtree and Related Hierarchical Data Structures , 1984, CSUR.

[20]  W. Hartup,et al.  The Company They Keep: Friendship in Childhood and Adolescence. , 1996 .

[21]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[22]  D. Ross,et al.  The Company They Keep: Friendships in Childhood and Adolescence , 1997 .