Inference of demographic attributes based on mobile phone usage patterns and social network topology

AbstractMobile phone usage provides a wealth of information, which can be used to better understand the demographic structure of a population. In this paper, we focus on the population of Mexican mobile phone users. We first present an observational study of mobile phone usage according to gender and age groups. We are able to detect significant differences in phone usage among different subgroups of the population. We then study the performance of different machine learning (ML) methods to predict demographic features (namely, age and gender) of unlabeled users by leveraging individual calling patterns, as well as the structure of the communication graph. We show how a specific implementation of a diffusion model, harnessing the graph structure, has significantly better performance over other node-based standard ML methods. We provide details of the methodology together with an analysis of the robustness of our results to changes in the model parameters. Furthermore, by carefully examining the topological relations of the training nodes (seed nodes) to the rest of the nodes in the network, we find topological metrics which have a direct influence on the performance of the algorithm.

[1]  A. Owen,et al.  Empirical stationary correlations for semi-supervised learning on graphs , 2010, 1011.1766.

[2]  Kentaro Toyama,et al.  Proceedings of the 4th ACM/IEEE International Conference on Information and Communication Technologies and Development , 2010, ICTD 2010.

[3]  Alex Pentland,et al.  Predicting Spending Behavior Using Socio-mobile Features , 2013, 2013 International Conference on Social Computing.

[4]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[5]  E. Katz,et al.  The Economics of Gender in Mexico: Work, Family, State, and Market. Directions in Development. , 2001 .

[6]  Sibel Adali,et al.  Predicting personality with social behavior: a comparative study , 2014, Social Network Analysis and Mining.

[7]  Thomas Hofmann,et al.  Semi-supervised Learning on Directed Graphs , 2004, NIPS.

[8]  泽熙 信息时代的in the Information管理 , 2000 .

[9]  W. Shadish,et al.  Experimental and Quasi-Experimental Designs for Generalized Causal Inference , 2001 .

[10]  Bin Wu,et al.  How Long Will She Call Me? Distribution, Social Theory and Duration Prediction , 2013, ECML/PKDD.

[11]  Mason A. Porter,et al.  Multilayer networks , 2013, J. Complex Networks.

[12]  Albert-László Barabási,et al.  Understanding individual human mobility patterns , 2008, Nature.

[13]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[14]  Lars Backstrom,et al.  The Anatomy of the Facebook Social Graph , 2011, ArXiv.

[15]  Razvan Stanica,et al.  Mobile Traffic Analysis: a Survey , 2015 .

[16]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[17]  W. Greene,et al.  计量经济分析 = Econometric analysis , 2009 .

[18]  Esteban Moro Egido,et al.  Time allocation in social networks: correlation between social structure and human communication dynamics , 2013, ArXiv.

[19]  C. Fischer,et al.  Networks and places: social relations in the urban setting , 1977 .

[20]  Vanessa Frías-Martínez,et al.  A Gender-Centric Analysis of Calling Behavior in a Developing Economy Using Call Detail Records , 2010, AAAI Spring Symposium: Artificial Intelligence for Development.

[21]  A-L Barabási,et al.  Structure and tie strengths in mobile communication networks , 2006, Proceedings of the National Academy of Sciences.

[22]  Eduard Heindl,et al.  Understanding the spreading patterns of mobile phone viruses , 2012 .

[23]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[24]  S. Feld Social Structural Determinants of Similarity among Associates , 1982 .

[25]  Beatrice Gralton,et al.  Washington DC - USA , 2008 .

[26]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[27]  Nitesh V. Chawla,et al.  Inferring user demographics and social strategies in mobile social networks , 2014, KDD.

[28]  Nathan Eagle,et al.  Who's Calling? Demographics of Mobile Phone Use in Rwanda , 2010, AAAI Spring Symposium: Artificial Intelligence for Development.

[29]  Shie Mannor,et al.  On information propagation in mobile call networks , 2013, Social Network Analysis and Mining.

[30]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[31]  Andreas Krause,et al.  Advances in Neural Information Processing Systems (NIPS) , 2014 .

[32]  Alejo Salles,et al.  Human mobility and predictability enriched by social phenomena information , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[33]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[34]  Vincent D. Blondel,et al.  Evaluating socio-economic state of a country analyzing airtime credit and mobile phone datasets , 2013, ArXiv.

[35]  Nathan Eagle,et al.  Mobile divides: gender, socioeconomic status, and mobile phone use in Rwanda , 2010, ICTD.

[36]  Skipper Seabold,et al.  Statsmodels: Econometric and Statistical Modeling with Python , 2010, SciPy.

[37]  Christos Faloutsos,et al.  Mobile call graphs: beyond power-law and lognormal distributions , 2008, KDD.

[38]  Carlos Sarraute,et al.  A study of age and gender seen through mobile phone usage patterns in Mexico , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[39]  J. Tukey Comparing individual means in the analysis of variance. , 1949, Biometrics.

[40]  KyungMann Kim,et al.  Contrasting treatment‐specific survival using double‐robust estimators , 2012 .

[41]  Georgi Georgiev,et al.  Self-organization in non-equilibrium systems , 2015 .

[42]  Matjaz Perc,et al.  The Matthew effect in empirical data , 2014, Journal of The Royal Society Interface.

[43]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .