"I don't have a photograph, but you can have my footprints.": Revealing the Demographics of Location Data

Location data are routinely available to a plethora of mobile apps and third party web services. The resulting datasets are increasingly available to advertisers for targeting and also requested by governmental agencies for law enforcement purposes. While the re-identification risk of such data has been widely reported, the discriminative power of mobility has received much less attention. In this study we fill this void with an open and reproducible method. We explore how the growing number of geotagged footprints left behind by social network users in photosharing services can give rise to inferring demographic information from mobility patterns. Chiefly among those, we provide the first detailed analysis of ethnic mobility patterns in two metropolitan areas. This analysis allows us to examine questions pertaining to spatial segregation and the extent to which ethnicity can be inferred using only location data. Our results reveal that even a few location records at a coarse grain can be sufficient for simple algorithms to draw an accurate inference. Our method generalizes to other features, such as gender, offering for the first time a general approach to evaluate discriminative risks associated with location-enabled personalization.

[1]  Cecilia Mascolo,et al.  An Empirical Study of Geographic User Activity Patterns in Foursquare , 2011, ICWSM.

[2]  Henry A. Kautz,et al.  Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields , 2007, Int. J. Robotics Res..

[3]  Nicholas Christakis,et al.  The Taste for Privacy: An Analysis of College Student Privacy Settings in an Online Social Network , 2008, J. Comput. Mediat. Commun..

[4]  R. Walgate Tale of two cities , 1984, Nature.

[5]  Urs Gasser,et al.  Teens, social media, and privacy , 2013 .

[6]  D. Weinberg,et al.  The residential segregation of detailed Hispanic and Asian groups in the United States: 1980-2010. , 2014, Demographic research.

[7]  J. Roscoe,et al.  An Investigation of the Restraints with Respect to Sample Size Commonly Imposed on the Use of the Chi-Square Statistic , 1971 .

[8]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[9]  Panagiotis G. Ipeirotis,et al.  Running Experiments on Amazon Mechanical Turk , 2010, Judgment and Decision Making.

[10]  Bhavani M. Thuraisingham,et al.  Inferring private information using social network data , 2009, WWW '09.

[11]  M. Kwan Gender, the Home-Work Link, and Space-Time Patterns of Nonemployment Activities , 1999 .

[12]  Jure Leskovec,et al.  Friendship and mobility: user movement in location-based social networks , 2011, KDD.

[13]  David Tse,et al.  Mobility increases the capacity of ad hoc wireless networks , 2002, TNET.

[14]  Sune Lehmann,et al.  Understanding the Demographics of Twitter Users , 2011, ICWSM.

[15]  Hui Zang,et al.  Anonymization of location data does not work: a large-scale measurement study , 2011, MobiCom.

[16]  Carlos Sarraute,et al.  Harnessing Mobile Phone Social Network Topology to Infer Users Demographic Attributes , 2014, SNAKDD'14.

[17]  Zhongwei Deng,et al.  Deriving Rules for Trip Purpose Identification from GPS Travel Survey Data and Land Use Data: A Machine Learning Approach , 2010 .

[18]  Kyumin Lee,et al.  Exploring Millions of Footprints in Location Sharing Services , 2011, ICWSM.

[19]  Ana-Maria Popescu,et al.  A Machine Learning Approach to Twitter User Classification , 2011, ICWSM.

[20]  Alex Pentland,et al.  Incremental Learning with Accuracy Prediction of Social and Individual Properties from Mobile-Phone Data , 2011, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[21]  M. White Segregation and diversity measures in population distribution. , 1986, Population index.

[22]  Margaret Martonosi,et al.  Identifying Important Places in People's Lives from Cellular Network Data , 2011, Pervasive.

[23]  J. Day,et al.  Computer and Internet Use in the United States: 2003 , 2005 .

[24]  N. Denton,et al.  The Dimensions of Residential Segregation , 1988 .

[25]  Daniele Quercia,et al.  The Hidden Image of the City: Sensing Community Well-Being from Urban Mobility , 2012, Pervasive.

[26]  Albert-László Barabási,et al.  Understanding individual human mobility patterns , 2008, Nature.

[27]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[28]  Albert-László Barabási,et al.  Limits of Predictability in Human Mobility , 2010, Science.

[29]  Saikat Guha,et al.  Koi: A Location-Privacy Platform for Smartphone Apps , 2012, NSDI.

[30]  David Yarowsky,et al.  Classifying latent user attributes in twitter , 2010, SMUC '10.

[31]  Margaret Martonosi,et al.  ON CELLULAR , 2022 .

[32]  Hunter N. B. Moseley,et al.  Limits of Predictability in Human Mobility , 2010 .

[33]  Subbarao Kambhampati,et al.  What We Instagram: A First Analysis of Instagram Photo Content and User Types , 2014, ICWSM.

[34]  David L. Brunsma,et al.  Navigating the Color Complex: How Multiracial Individuals Narrate the Elements of Appearance and Dynamics of Color in Twenty-First-Century America , 2013 .

[35]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[36]  Nicholas Jing Yuan,et al.  You Are Where You Go: Inferring Demographic Attributes from Location Check-ins , 2015, WSDM.

[37]  地理学 United States Census Bureau , 2011 .

[38]  Carlos Sarraute,et al.  A study of age and gender seen through mobile phone usage patterns in Mexico , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[39]  Lars Backstrom,et al.  ePluribus: Ethnicity on Social Networks , 2010, ICWSM.

[40]  César A. Hidalgo,et al.  Unique in the Crowd: The privacy bounds of human mobility , 2013, Scientific Reports.

[41]  Margaret Martonosi,et al.  Ranges of human mobility in Los Angeles and New York , 2011, 2011 IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops).

[42]  Ronald E. Hall The melanin millennium , 2013 .

[43]  Alex Pentland,et al.  Predicting Personality Using Novel Mobile Phone-Based Metrics , 2013, SBP.

[44]  Davy Janssens,et al.  Annotating mobile phone location data with activity purposes using machine learning algorithms , 2013, Expert Syst. Appl..

[45]  Aniket Kittur,et al.  Bridging the gap between physical location and online social networks , 2010, UbiComp.