Inferring social ties from geographic coincidences

We investigate the extent to which social ties between people can be inferred from co-occurrence in time and space: Given that two people have been in approximately the same geographic locale at approximately the same time, on multiple occasions, how likely are they to know each other? Furthermore, how does this likelihood depend on the spatial and temporal proximity of the co-occurrences? Such issues arise in data originating in both online and offline domains as well as settings that capture interfaces between online and offline behavior. Here we develop a framework for quantifying the answers to such questions, and we apply this framework to publicly available data from a social media site, finding that even a very small number of co-occurrences can result in a high empirical likelihood of a social tie. We then present probabilistic models showing how such large probabilities can arise from a natural model of proximity and co-occurrence in the presence of social ties. In addition to providing a method for establishing some of the first quantifiable estimates of these measures, our findings have potential privacy implications, particularly for the ways in which social structures can be inferred from public online records that capture individuals’ physical locations over time.

[1]  Massimo Barbaro,et al.  A Face Is Exposed for AOL Searcher No , 2006 .

[2]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[3]  J. Laurie Snell,et al.  Markov Random Fields and Their Applications , 1980 .

[4]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[5]  David Lazer,et al.  Inferring friendship network structure by using mobile phone data , 2009, Proceedings of the National Academy of Sciences.

[6]  Vitaly Shmatikov,et al.  How To Break Anonymity of the Netflix Prize Dataset , 2006, ArXiv.

[7]  Albert-László Barabási,et al.  Understanding individual human mobility patterns , 2008, Nature.

[8]  T. Geisel,et al.  The scaling laws of human travel , 2006, Nature.

[9]  D. Lazer,et al.  Inferring Social Network Structure using Mobile Phone Data , 2006 .

[10]  Natalia Adrienko,et al.  Spatial Generalization and Aggregation of Massive Movement Data , 2011 .

[11]  Alessandro Acquisti,et al.  Information revelation and privacy in online social networks , 2005, WPES '05.

[12]  Vitaly Shmatikov,et al.  De-anonymizing Social Networks , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[13]  Frederick Mosteller,et al.  Methods for studying coincidences , 1989 .

[14]  Anna Monreale,et al.  Movement data anonymity through generalization , 2009, SPRINGL '09.

[15]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[16]  Jasmine Novak,et al.  Anti-aliasing on the web , 2004, WWW '04.

[17]  R. Ericson,et al.  The New Politics of Surveillance and Visibility , 2006 .

[18]  Cynthia Dwork,et al.  Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography , 2007, WWW '07.

[19]  Foster Provost,et al.  Audience selection for on-line brand advertising: privacy-friendly social network targeting , 2009, KDD.

[20]  Mor Naaman,et al.  HT06, tagging paper, taxonomy, Flickr, academic article, to read , 2006, HYPERTEXT '06.

[21]  Rossano Schifanella,et al.  Folks in Folksonomies: social link prediction from shared metadata , 2010, WSDM '10.

[22]  K. Gorski,et al.  HEALPix: A Framework for High-Resolution Discretization and Fast Analysis of Data Distributed on the Sphere , 2004, astro-ph/0409513.

[23]  Gennady L. Andrienko,et al.  Spatial Generalization and Aggregation of Massive Movement Data , 2011, IEEE Transactions on Visualization and Computer Graphics.

[24]  J. Tenenbaum,et al.  Proceedings of the Annual Meeting of the Cognitive Science Society , 2015 .

[25]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[26]  Alessandro Acquisti,et al.  Predicting Social Security numbers from public data , 2009, Proceedings of the National Academy of Sciences.

[27]  S. Milgram The experience of living in cities. , 1970, Science.