You Are Where You Go: Inferring Demographic Attributes from Location Check-ins

User profiling is crucial to many online services. Several recent studies suggest that demographic attributes are predictable from different online behavioral data, such as users' "Likes" on Facebook, friendship relations, and the linguistic characteristics of tweets. But location check-ins, as a bridge of users' offline and online lives, have by and large been overlooked in inferring user profiles. In this paper, we investigate the predictive power of location check-ins for inferring users' demographics and propose a simple yet general location to profile (L2P) framework. More specifically, we extract rich semantics of users' check-ins in terms of spatiality, temporality, and location knowledge, where the location knowledge is enriched with semantics mined from heterogeneous domains including both online customer review sites and social networks. Additionally, tensor factorization is employed to draw out low dimensional representations of users' intrinsic check-in preferences considering the above factors. Meanwhile, the extracted features are used to train predictive models for inferring various demographic attributes. We collect a large dataset consisting of profiles of 159,530 verified users from an online social network. Extensive experimental results based upon this dataset validate that: 1) Location check-ins are diagnostic representations of a variety of demographic attributes, such as gender, age, education background, and marital status; 2) The proposed framework substantially outperforms compared models for profile inference in terms of various evaluation metrics, such as precision, recall, F-measure, and AUC.

[1]  Salil Pradhan,et al.  Semantic location , 2000, Personal Technologies.

[2]  Xing Xie,et al.  An efficient location extraction algorithm by leveraging web contextual information , 2010, GIS '10.

[3]  Panagiotis Symeonidis,et al.  A Unified Framework for Providing Recommendations in Social Tagging Systems Based on Ternary Semantic Analysis , 2010, IEEE Transactions on Knowledge and Data Engineering.

[4]  F. Mosteller,et al.  A comparative study of discrimination methods applied to the authorship of the disputed Federalist papers , 2016 .

[5]  M. Back,et al.  How extraverted is honey.bunny77@hotmail.de? Inferring personality from e-mail addresses , 2008 .

[6]  Krzysztof Janowicz,et al.  On the semantic annotation of places in location-based social networks , 2011, KDD.

[7]  Ana-Maria Popescu,et al.  Democrats, republicans and starbucks afficionados: user classification in twitter , 2011, KDD.

[8]  Yi Lu Murphey,et al.  Multi-class pattern classification using neural networks , 2007, Pattern Recognit..

[9]  Bülent Yener,et al.  Modeling and Multiway Analysis of Chatroom Tensors , 2005, ISI.

[10]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[11]  Qiang Wu,et al.  Adapting boosting for information retrieval measures , 2010, Information Retrieval.

[12]  Lise Getoor,et al.  To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles , 2009, WWW '09.

[13]  Michael W. Berry,et al.  Discussion Tracking in Enron Email using PARAFAC. , 2008 .

[14]  ManolopoulosYannis,et al.  A Unified Framework for Providing Recommendations in Social Tagging Systems Based on Ternary Semantic Analysis , 2010 .

[15]  Mari Ostendorf,et al.  A Quantitative Analysis of Lexical Differences Between Genders in Telephone Conversations , 2005, ACL.

[16]  Yi Lu Murphey,et al.  Multiclass pattern classification using neural networks , 2004, ICPR 2004.

[17]  Xing Xie,et al.  Discovering regions of different functions in a city using human mobility and POIs , 2012, KDD.

[18]  W. Labov The social stratification of English in New York City , 1969 .

[19]  Christian S. Jensen,et al.  Mining significant semantic locations from GPS data , 2010, Proc. VLDB Endow..

[20]  T. Graepel,et al.  Private traits and attributes are predictable from digital records of human behavior , 2013, Proceedings of the National Academy of Sciences.

[21]  Tamara G. Kolda,et al.  Link Prediction on Evolving Data Using Matrix and Tensor Factorizations , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[22]  Paul T. Costa,et al.  "'Normal' personality inventories in clinical assessment: General requirements and the potential for using the NEO Personality Inventory": Reply. , 1992 .

[23]  Yue Lu,et al.  Latent aspect rating analysis without aspect keyword supervision , 2011, KDD.

[24]  P. Trudgill The Social Differentiation of English in Norwich , 1974 .

[25]  Dan Murray,et al.  Inferring Demographic Attributes of Anonymus Internet Users , 1999, WEBKDD.

[26]  Krzysztof Janowicz,et al.  What you are is when you are: the temporal dimension of feature types in location-based social networks , 2011, GIS.

[27]  Nitesh V. Chawla,et al.  Inferring user demographics and social strategies in mobile social networks , 2014, KDD.

[28]  David Yarowsky,et al.  Classifying latent user attributes in twitter , 2010, SMUC '10.

[29]  G. De’ath Boosted trees for ecological modeling and prediction. , 2007, Ecology.

[30]  Krishna P. Gummadi,et al.  You are who you know: inferring user profiles in online social networks , 2010, WSDM '10.

[31]  D. Funder,et al.  Personality as manifest in word use: correlations with self-report, acquaintance report, and behavior. , 2008, Journal of personality and social psychology.

[32]  Ouri Wolfson,et al.  Extracting Semantic Location from Outdoor Positioning Systems , 2006, 7th International Conference on Mobile Data Management (MDM'06).

[33]  Albert-László Barabási,et al.  Understanding individual human mobility patterns , 2008, Nature.

[34]  F. Mosteller,et al.  Inference in an Authorship Problem , 1963 .

[35]  Nicholas Jing Yuan,et al.  We know how you live: exploring the spectrum of urban lifestyles , 2013, COSN '13.

[36]  D. Culibrk,et al.  Demographic Attributes Prediction on the Real-World Mobile Data , 2012 .