Home location inference from sparse and noisy data: models and applications

Accurate home location is increasingly important for urban computing. Existing methods either rely on continuous (and expensive) Global Positioning System (GPS) data or suffer from poor accuracy. In particular, the sparse and noisy nature of social media data poses serious challenges in pinpointing where people live at scale. We revisit this research topic and infer home location within 100 m×100 m squares at 70% accuracy for 76% and 71% of active users in New York City and the Bay Area, respectively. To the best of our knowledge, this is the first time home location has been detected at such a fine granularity using sparse and noisy data. Since people spend a large portion of their time at home, our model enables novel applications. As an example, we focus on modeling people’s health at scale by linking their home locations with publicly available statistics, such as education disparity. Results in multiple geographic regions demonstrate both the effectiveness and added value of our home localization method and reveal insights that eluded earlier studies. In addition, we are able to discover the real buzz in the communities where people live.

[1]  Jeffrey Nichols,et al.  Where Is This Tweet From? Inferring Home Locations of Twitter Users , 2012, ICWSM.

[2]  Gavin Smith,et al.  A refined limit on the predictability of human mobility , 2014, 2014 IEEE International Conference on Pervasive Computing and Communications (PerCom).

[3]  Henry A. Kautz,et al.  Modeling the impact of lifestyle on health at scale , 2013, WSDM.

[4]  Aniket Kittur,et al.  Bridging the gap between physical location and online social networks , 2010, UbiComp.

[5]  Cecilia Mascolo,et al.  Exploiting place features in link prediction on location-based social networks , 2011, KDD.

[6]  Cecilia Mascolo,et al.  Socio-Spatial Properties of Online Location-Based Social Networks , 2011, ICWSM.

[7]  Aron Culotta,et al.  Towards detecting influenza epidemics by analyzing Twitter messages , 2010, SOMA '10.

[8]  Virgílio A. F. Almeida,et al.  Beware of What You Share: Inferring Home Location in Social Networks , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[9]  Jure Leskovec,et al.  Friendship and mobility: user movement in location-based social networks , 2011, KDD.

[10]  John Krumm,et al.  Placer: semantic place labels from diary data , 2013, UbiComp.

[11]  Hui Xiong,et al.  Enhancing Security and Privacy in Traffic-Monitoring Systems , 2006, IEEE Pervasive Computing.

[12]  Henry A. Kautz,et al.  Modeling Spread of Disease from Social Interactions , 2012, ICWSM.

[13]  R. Sapolsky Social Status and Health in Humans and Other Animals , 2004 .

[14]  Lars Backstrom,et al.  Find me if you can: improving geographical prediction with social and spatial proximity , 2010, WWW '10.

[15]  John Krumm,et al.  Far Out: Predicting Long-Term Human Mobility , 2012, AAAI.

[16]  John Krumm,et al.  Inference Attacks on Location Tracks , 2007, Pervasive.

[17]  Kyumin Lee,et al.  Exploring Millions of Footprints in Location Sharing Services , 2011, ICWSM.

[18]  S. Fortmann,et al.  Socioeconomic status and health: how education, income, and occupation contribute to risk factors for cardiovascular disease. , 1992, American journal of public health.

[19]  Thad Starner,et al.  Using GPS to learn significant locations and predict movement across multiple users , 2003, Personal and Ubiquitous Computing.

[20]  Albert-László Barabási,et al.  Limits of Predictability in Human Mobility , 2010, Science.

[21]  Virgílio A. F. Almeida,et al.  We know where you live: privacy characterization of foursquare behavior , 2012, UbiComp.

[22]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[23]  Wenpu Xing,et al.  Weighted PageRank algorithm , 2004, Proceedings. Second Annual Conference on Communication Networks and Services Research, 2004..

[24]  Wen-Jing Hsu,et al.  Predictability of individuals' mobility with high-resolution positioning data , 2012, UbiComp.